I think the reason is simply that there is no longer an explicit
min-partitions argument for Hadoop InputSplits in the new Hadoop APIs.
At least, I didn't see it when I glanced just now.

However, you should be able to get the same effect by setting a
Configuration property, and you can do so through the newAPIHadoopFile
method. You set it as a suggested maximum split size rather than
suggest minimum number of splits.

Although I think the old config property mapred.max.split.size is
still respected, you may try
mapreduce.input.fileinputformat.split.maxsize instead, which appears
to be the intended replacement in the new APIs.

On Mon, Sep 15, 2014 at 9:35 PM, Eric Friedman
<eric.d.fried...@gmail.com> wrote:
> sc.textFile takes a minimum # of partitions to use.
>
> is there a way to get sc.newAPIHadoopFile to do the same?
>
> I know I can repartition() and get a shuffle.  I'm wondering if there's a
> way to tell the underlying InputFormat (AvroParquet, in my case) how many
> partitions to use at the outset.
>
> What I'd really prefer is to get the partitions automatically defined based
> on the number of blocks.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to