Re: Configuring custom input format

2014-11-25 Thread Harihar Nahak
Hi, 

I'm trying to make custom input format for CSV file, if you can share little
bit more what you read as input and what things you have implemented. I'll
try to replicate the same things. If I find something interesting at my end
I'll let you know. 

Thanks,
Harihar



-
--Harihar
--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Configuring-custom-input-format-tp18220p19800.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Configuring custom input format

2014-11-25 Thread Matei Zaharia
How are you creating the object in your Scala shell? Maybe you can write a 
function that directly returns the RDD, without assigning the object to a 
temporary variable.

Matei

 On Nov 5, 2014, at 2:54 PM, Corey Nolet cjno...@gmail.com wrote:
 
 The closer I look @ the stack trace in the Scala shell, it appears to be the 
 call to toString() that is causing the construction of the Job object to 
 fail. Is there a ways to suppress this output since it appears to be 
 hindering my ability to new up this object?
 
 On Wed, Nov 5, 2014 at 5:49 PM, Corey Nolet cjno...@gmail.com 
 mailto:cjno...@gmail.com wrote:
 I'm trying to use a custom input format with SparkContext.newAPIHadoopRDD. 
 Creating the new RDD works fine but setting up the configuration file via the 
 static methods on input formats that require a Hadoop Job object is proving 
 to be difficult. 
 
 Trying to new up my own Job object with the SparkContext.hadoopConfiguration 
 is throwing the exception on line 283 of this grepcode:
 
 http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-core/2.5.0/org/apache/hadoop/mapreduce/Job.java#Job
  
 http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-core/2.5.0/org/apache/hadoop/mapreduce/Job.java#Job
 
 Looking in the SparkContext code, I'm seeing that it's newing up Job objects 
 just fine using nothing but the configuraiton. Using SparkContext.textFile() 
 appears to be working for me. Any ideas? Has anyone else run into this as 
 well? Is it possible to have a method like SparkContext.getJob() or something 
 similar?
 
 Thanks.
 
 



Re: Configuring custom input format

2014-11-25 Thread Corey Nolet
I was wiring up my job in the shell while i was learning Spark/Scala. I'm
getting more comfortable with them both now so I've been mostly testing
through Intellij with mock data as inputs.

I think the problem lies more on Hadoop than Spark as the Job object seems
to check it's state and throw an exception when the toString() method is
called before the Job has physically been submitted.

On Tue, Nov 25, 2014 at 5:31 PM, Matei Zaharia matei.zaha...@gmail.com
wrote:

 How are you creating the object in your Scala shell? Maybe you can write a
 function that directly returns the RDD, without assigning the object to a
 temporary variable.

 Matei

 On Nov 5, 2014, at 2:54 PM, Corey Nolet cjno...@gmail.com wrote:

 The closer I look @ the stack trace in the Scala shell, it appears to be
 the call to toString() that is causing the construction of the Job object
 to fail. Is there a ways to suppress this output since it appears to be
 hindering my ability to new up this object?

 On Wed, Nov 5, 2014 at 5:49 PM, Corey Nolet cjno...@gmail.com wrote:

 I'm trying to use a custom input format with
 SparkContext.newAPIHadoopRDD. Creating the new RDD works fine but setting
 up the configuration file via the static methods on input formats that
 require a Hadoop Job object is proving to be difficult.

 Trying to new up my own Job object with the
 SparkContext.hadoopConfiguration is throwing the exception on line 283 of
 this grepcode:


 http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-core/2.5.0/org/apache/hadoop/mapreduce/Job.java#Job

 Looking in the SparkContext code, I'm seeing that it's newing up Job
 objects just fine using nothing but the configuraiton. Using
 SparkContext.textFile() appears to be working for me. Any ideas? Has anyone
 else run into this as well? Is it possible to have a method like
 SparkContext.getJob() or something similar?

 Thanks.






Re: Configuring custom input format

2014-11-25 Thread Matei Zaharia
Yeah, unfortunately that will be up to them to fix, though it wouldn't hurt to 
send them a JIRA mentioning this.

Matei

 On Nov 25, 2014, at 2:58 PM, Corey Nolet cjno...@gmail.com wrote:
 
 I was wiring up my job in the shell while i was learning Spark/Scala. I'm 
 getting more comfortable with them both now so I've been mostly testing 
 through Intellij with mock data as inputs.
 
 I think the problem lies more on Hadoop than Spark as the Job object seems to 
 check it's state and throw an exception when the toString() method is called 
 before the Job has physically been submitted.
 
 On Tue, Nov 25, 2014 at 5:31 PM, Matei Zaharia matei.zaha...@gmail.com 
 mailto:matei.zaha...@gmail.com wrote:
 How are you creating the object in your Scala shell? Maybe you can write a 
 function that directly returns the RDD, without assigning the object to a 
 temporary variable.
 
 Matei
 
 On Nov 5, 2014, at 2:54 PM, Corey Nolet cjno...@gmail.com 
 mailto:cjno...@gmail.com wrote:
 
 The closer I look @ the stack trace in the Scala shell, it appears to be the 
 call to toString() that is causing the construction of the Job object to 
 fail. Is there a ways to suppress this output since it appears to be 
 hindering my ability to new up this object?
 
 On Wed, Nov 5, 2014 at 5:49 PM, Corey Nolet cjno...@gmail.com 
 mailto:cjno...@gmail.com wrote:
 I'm trying to use a custom input format with SparkContext.newAPIHadoopRDD. 
 Creating the new RDD works fine but setting up the configuration file via 
 the static methods on input formats that require a Hadoop Job object is 
 proving to be difficult. 
 
 Trying to new up my own Job object with the SparkContext.hadoopConfiguration 
 is throwing the exception on line 283 of this grepcode:
 
 http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-core/2.5.0/org/apache/hadoop/mapreduce/Job.java#Job
  
 http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-core/2.5.0/org/apache/hadoop/mapreduce/Job.java#Job
 
 Looking in the SparkContext code, I'm seeing that it's newing up Job objects 
 just fine using nothing but the configuraiton. Using SparkContext.textFile() 
 appears to be working for me. Any ideas? Has anyone else run into this as 
 well? Is it possible to have a method like SparkContext.getJob() or 
 something similar?
 
 Thanks.
 
 
 
 



Re: Configuring custom input format

2014-11-05 Thread Corey Nolet
The closer I look @ the stack trace in the Scala shell, it appears to be
the call to toString() that is causing the construction of the Job object
to fail. Is there a ways to suppress this output since it appears to be
hindering my ability to new up this object?

On Wed, Nov 5, 2014 at 5:49 PM, Corey Nolet cjno...@gmail.com wrote:

 I'm trying to use a custom input format with SparkContext.newAPIHadoopRDD.
 Creating the new RDD works fine but setting up the configuration file via
 the static methods on input formats that require a Hadoop Job object is
 proving to be difficult.

 Trying to new up my own Job object with the
 SparkContext.hadoopConfiguration is throwing the exception on line 283 of
 this grepcode:


 http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-core/2.5.0/org/apache/hadoop/mapreduce/Job.java#Job

 Looking in the SparkContext code, I'm seeing that it's newing up Job
 objects just fine using nothing but the configuraiton. Using
 SparkContext.textFile() appears to be working for me. Any ideas? Has anyone
 else run into this as well? Is it possible to have a method like
 SparkContext.getJob() or something similar?

 Thanks.