RE: Directory / File Reading Patterns

2015-01-18 Thread Bob Tiernay
...@hortonworks.com CC: user@spark.apache.org Subject: RE: Directory / File Reading Patterns Date: Sun, 18 Jan 2015 15:41:53 + You may also want to keep an eye on SPARK-5182 / SPARK-5302 which may help if you are using Spark SQL. It should be noted that this is possible with HiveContext today. Cheers

RE: Directory / File Reading Patterns

2015-01-18 Thread Bob Tiernay
You may also want to keep an eye on SPARK-5182 / SPARK-5302 which may help if you are using Spark SQL. It should be noted that this is possible with HiveContext today. Cheers, Bob Date: Sun, 18 Jan 2015 08:47:06 + Subject: Re: Directory / File Reading Patterns From: so...@cloudera.com

Re: Directory / File Reading Patterns

2015-01-18 Thread Sean Owen
I think that putting part of the data (only) in a filename is an anti-pattern, but we sometimes have to play these where they lie. You can list all the directory paths containing the CSV files, map them each to RDDs with textFile, transform the RDDs to include info from the path, and then simply

Directory / File Reading Patterns

2015-01-17 Thread Steve Nunez
Hello Users, I've got a real-world use case that seems common enough that its pattern would be documented somewhere, but I can't find any references to a simple solution. The challenge is that data is getting dumped into a directory structure, and that directory structure itself contains