Re: Configuring custom input format
Hi, I'm trying to make custom input format for CSV file, if you can share little bit more what you read as input and what things you have implemented. I'll try to replicate the same things. If I find something interesting at my end I'll let you know. Thanks, Harihar - --Harihar -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Configuring-custom-input-format-tp18220p19800.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Configuring custom input format
How are you creating the object in your Scala shell? Maybe you can write a function that directly returns the RDD, without assigning the object to a temporary variable. Matei On Nov 5, 2014, at 2:54 PM, Corey Nolet cjno...@gmail.com wrote: The closer I look @ the stack trace in the Scala shell, it appears to be the call to toString() that is causing the construction of the Job object to fail. Is there a ways to suppress this output since it appears to be hindering my ability to new up this object? On Wed, Nov 5, 2014 at 5:49 PM, Corey Nolet cjno...@gmail.com mailto:cjno...@gmail.com wrote: I'm trying to use a custom input format with SparkContext.newAPIHadoopRDD. Creating the new RDD works fine but setting up the configuration file via the static methods on input formats that require a Hadoop Job object is proving to be difficult. Trying to new up my own Job object with the SparkContext.hadoopConfiguration is throwing the exception on line 283 of this grepcode: http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-core/2.5.0/org/apache/hadoop/mapreduce/Job.java#Job http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-core/2.5.0/org/apache/hadoop/mapreduce/Job.java#Job Looking in the SparkContext code, I'm seeing that it's newing up Job objects just fine using nothing but the configuraiton. Using SparkContext.textFile() appears to be working for me. Any ideas? Has anyone else run into this as well? Is it possible to have a method like SparkContext.getJob() or something similar? Thanks.
Re: Configuring custom input format
I was wiring up my job in the shell while i was learning Spark/Scala. I'm getting more comfortable with them both now so I've been mostly testing through Intellij with mock data as inputs. I think the problem lies more on Hadoop than Spark as the Job object seems to check it's state and throw an exception when the toString() method is called before the Job has physically been submitted. On Tue, Nov 25, 2014 at 5:31 PM, Matei Zaharia matei.zaha...@gmail.com wrote: How are you creating the object in your Scala shell? Maybe you can write a function that directly returns the RDD, without assigning the object to a temporary variable. Matei On Nov 5, 2014, at 2:54 PM, Corey Nolet cjno...@gmail.com wrote: The closer I look @ the stack trace in the Scala shell, it appears to be the call to toString() that is causing the construction of the Job object to fail. Is there a ways to suppress this output since it appears to be hindering my ability to new up this object? On Wed, Nov 5, 2014 at 5:49 PM, Corey Nolet cjno...@gmail.com wrote: I'm trying to use a custom input format with SparkContext.newAPIHadoopRDD. Creating the new RDD works fine but setting up the configuration file via the static methods on input formats that require a Hadoop Job object is proving to be difficult. Trying to new up my own Job object with the SparkContext.hadoopConfiguration is throwing the exception on line 283 of this grepcode: http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-core/2.5.0/org/apache/hadoop/mapreduce/Job.java#Job Looking in the SparkContext code, I'm seeing that it's newing up Job objects just fine using nothing but the configuraiton. Using SparkContext.textFile() appears to be working for me. Any ideas? Has anyone else run into this as well? Is it possible to have a method like SparkContext.getJob() or something similar? Thanks.
Re: Configuring custom input format
Yeah, unfortunately that will be up to them to fix, though it wouldn't hurt to send them a JIRA mentioning this. Matei On Nov 25, 2014, at 2:58 PM, Corey Nolet cjno...@gmail.com wrote: I was wiring up my job in the shell while i was learning Spark/Scala. I'm getting more comfortable with them both now so I've been mostly testing through Intellij with mock data as inputs. I think the problem lies more on Hadoop than Spark as the Job object seems to check it's state and throw an exception when the toString() method is called before the Job has physically been submitted. On Tue, Nov 25, 2014 at 5:31 PM, Matei Zaharia matei.zaha...@gmail.com mailto:matei.zaha...@gmail.com wrote: How are you creating the object in your Scala shell? Maybe you can write a function that directly returns the RDD, without assigning the object to a temporary variable. Matei On Nov 5, 2014, at 2:54 PM, Corey Nolet cjno...@gmail.com mailto:cjno...@gmail.com wrote: The closer I look @ the stack trace in the Scala shell, it appears to be the call to toString() that is causing the construction of the Job object to fail. Is there a ways to suppress this output since it appears to be hindering my ability to new up this object? On Wed, Nov 5, 2014 at 5:49 PM, Corey Nolet cjno...@gmail.com mailto:cjno...@gmail.com wrote: I'm trying to use a custom input format with SparkContext.newAPIHadoopRDD. Creating the new RDD works fine but setting up the configuration file via the static methods on input formats that require a Hadoop Job object is proving to be difficult. Trying to new up my own Job object with the SparkContext.hadoopConfiguration is throwing the exception on line 283 of this grepcode: http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-core/2.5.0/org/apache/hadoop/mapreduce/Job.java#Job http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-core/2.5.0/org/apache/hadoop/mapreduce/Job.java#Job Looking in the SparkContext code, I'm seeing that it's newing up Job objects just fine using nothing but the configuraiton. Using SparkContext.textFile() appears to be working for me. Any ideas? Has anyone else run into this as well? Is it possible to have a method like SparkContext.getJob() or something similar? Thanks.
Re: Configuring custom input format
The closer I look @ the stack trace in the Scala shell, it appears to be the call to toString() that is causing the construction of the Job object to fail. Is there a ways to suppress this output since it appears to be hindering my ability to new up this object? On Wed, Nov 5, 2014 at 5:49 PM, Corey Nolet cjno...@gmail.com wrote: I'm trying to use a custom input format with SparkContext.newAPIHadoopRDD. Creating the new RDD works fine but setting up the configuration file via the static methods on input formats that require a Hadoop Job object is proving to be difficult. Trying to new up my own Job object with the SparkContext.hadoopConfiguration is throwing the exception on line 283 of this grepcode: http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-core/2.5.0/org/apache/hadoop/mapreduce/Job.java#Job Looking in the SparkContext code, I'm seeing that it's newing up Job objects just fine using nothing but the configuraiton. Using SparkContext.textFile() appears to be working for me. Any ideas? Has anyone else run into this as well? Is it possible to have a method like SparkContext.getJob() or something similar? Thanks.