Try something like that:

 def readGenericRecords(sc: SparkContext, inputDir: String, startDate:
Date, endDate: Date) = {

   // assuming a list of paths

   val paths: Seq[String] = getInputPaths(inputDir, startDate, endDate)

   val job = Job.getInstance(new Configuration(sc.hadoopConfiguration))

   paths.drop(1).foreach(p => FileInputFormat.addInputPath(job, new
Path(p)))

   sc.newAPIHadoopFile(paths.head,
classOf[AvroKeyInputFormat[GenericRecord]], classOf[NullWritable],
classOf[GenericRecord], job.getConfiguration())

  }

2015-05-27 10:55 GMT+02:00 ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com>:

>
>  def readGenericRecords(sc: SparkContext, inputDir: String, startDate:
> Date, endDate: Date) = {
>
>     val path = getInputPaths(inputDir, startDate, endDate)
>
>    sc.newAPIHadoopFile[AvroKey[GenericRecord], NullWritable,
> AvroKeyInputFormat[GenericRecord]]("/A/B/C/D/D/2015/05/22/out-r-*.avro")
>
>   }
>
>
> This is my method, can you show me where should i modify to use
> FileInputFormat ? If you add the path there what should you give while
> invoking newAPIHadoopFile
>
> On Wed, May 27, 2015 at 2:20 PM, Eugen Cepoi <cepoi.eu...@gmail.com>
> wrote:
>
>> You can do that using FileInputFormat.addInputPath
>>
>> 2015-05-27 10:41 GMT+02:00 ayan guha <guha.a...@gmail.com>:
>>
>>> What about /blah/*/blah/out*.avro?
>>> On 27 May 2015 18:08, "ÐΞ€ρ@Ҝ (๏̯͡๏)" <deepuj...@gmail.com> wrote:
>>>
>>>> I am doing that now.
>>>> Is there no other way ?
>>>>
>>>> On Wed, May 27, 2015 at 12:40 PM, Akhil Das <ak...@sigmoidanalytics.com
>>>> > wrote:
>>>>
>>>>> How about creating two and union [ sc.union(first, second) ] them?
>>>>>
>>>>> Thanks
>>>>> Best Regards
>>>>>
>>>>> On Wed, May 27, 2015 at 11:51 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I have this piece
>>>>>>
>>>>>> sc.newAPIHadoopFile[AvroKey[GenericRecord], NullWritable,
>>>>>> AvroKeyInputFormat[GenericRecord]](
>>>>>> "/a/b/c/d/exptsession/2015/05/22/out-r-*.avro")
>>>>>>
>>>>>> that takes ("/a/b/c/d/exptsession/2015/05/22/out-r-*.avro") this as
>>>>>> input.
>>>>>>
>>>>>> I want to give a second directory as input but this is a invalid
>>>>>> syntax
>>>>>>
>>>>>>
>>>>>> that takes ("/a/b/c/d/exptsession/2015/05/*22*/out-r-*.avro",
>>>>>> "/a/b/c/d/exptsession/2015/05/*21*/out-r-*.avro")
>>>>>>
>>>>>> OR
>>>>>>
>>>>>> ("/a/b/c/d/exptsession/2015/05/*22*/out-r-*.avro,
>>>>>> /a/b/c/d/exptsession/2015/05/*21*/out-r-*.avro")
>>>>>>
>>>>>>
>>>>>> Please suggest.
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Deepak
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Deepak
>>>>
>>>>
>>
>
>
> --
> Deepak
>
>

Reply via email to