Re: Spark RuntimeException hadoop output format

2015-08-14 Thread Ted Yu
First you create the file:

final File outputFile = new File(outputPath);

Then you write to it:
Files.append(counts + \n, outputFile, Charset.defaultCharset());

Cheers

On Fri, Aug 14, 2015 at 4:38 PM, Mohit Anchlia mohitanch...@gmail.com
wrote:

 I thought prefix meant the output path? What's the purpose of prefix and
 where do I specify the path if not in prefix?

 On Fri, Aug 14, 2015 at 4:36 PM, Ted Yu yuzhih...@gmail.com wrote:

 Please take a look at JavaPairDStream.scala:
  def saveAsHadoopFiles[F : OutputFormat[_, _]](
   prefix: String,
   suffix: String,
   keyClass: Class[_],
   valueClass: Class[_],
   outputFormatClass: Class[F]) {

 Did you intend to use outputPath as prefix ?

 Cheers


 On Fri, Aug 14, 2015 at 1:36 PM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

 Spark 1.3

 Code:

 wordCounts.foreachRDD(*new* *Function2JavaPairRDDString, Integer,
 Time, Void()* {

 @Override

 *public* Void call(JavaPairRDDString, Integer rdd, Time time) *throws*
 IOException {

 String counts = Counts at time  + time +   + rdd.collect();

 System.*out*.println(counts);

 System.*out*.println(Appending to  + outputFile.getAbsolutePath());

 Files.*append*(counts + \n, outputFile, Charset.*defaultCharset*());

 *return* *null*;

 }

 });

 wordCounts.saveAsHadoopFiles(outputPath, txt, Text.*class*, Text.
 *class*, TextOutputFormat.*class*);


 What do I need to check in namenode? I see 0 bytes files like this:


 drwxr-xr-x   - ec2-user supergroup  0 2015-08-13 15:45
 /tmp/out-1439495124000.txt
 drwxr-xr-x   - ec2-user supergroup  0 2015-08-13 15:45
 /tmp/out-1439495125000.txt
 drwxr-xr-x   - ec2-user supergroup  0 2015-08-13 15:45
 /tmp/out-1439495126000.txt
 drwxr-xr-x   - ec2-user supergroup  0 2015-08-13 15:45
 /tmp/out-1439495127000.txt
 drwxr-xr-x   - ec2-user supergroup  0 2015-08-13 15:45
 /tmp/out-1439495128000.txt



 However, I also wrote data to a local file on the local file system for
 verification and I see the data:


 $ ls -ltr !$
 ls -ltr /tmp/out
 -rw-r--r-- 1 yarn yarn 5230 Aug 13 15:45 /tmp/out


 On Fri, Aug 14, 2015 at 6:15 AM, Ted Yu yuzhih...@gmail.com wrote:

 Which Spark release are you using ?

 Can you show us snippet of your code ?

 Have you checked namenode log ?

 Thanks



 On Aug 13, 2015, at 10:21 PM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

 I was able to get this working by using an alternative method however I
 only see 0 bytes files in hadoop. I've verified that the output does exist
 in the logs however it's missing from hdfs.

 On Thu, Aug 13, 2015 at 10:49 AM, Mohit Anchlia mohitanch...@gmail.com
  wrote:

 I have this call trying to save to hdfs 2.6

 wordCounts.saveAsNewAPIHadoopFiles(prefix, txt);

 but I am getting the following:
 java.lang.RuntimeException: class scala.runtime.Nothing$ not
 org.apache.hadoop.mapreduce.OutputFormat








Re: Spark RuntimeException hadoop output format

2015-08-14 Thread Ted Yu
Please take a look at JavaPairDStream.scala:
 def saveAsHadoopFiles[F : OutputFormat[_, _]](
  prefix: String,
  suffix: String,
  keyClass: Class[_],
  valueClass: Class[_],
  outputFormatClass: Class[F]) {

Did you intend to use outputPath as prefix ?

Cheers


On Fri, Aug 14, 2015 at 1:36 PM, Mohit Anchlia mohitanch...@gmail.com
wrote:

 Spark 1.3

 Code:

 wordCounts.foreachRDD(*new* *Function2JavaPairRDDString, Integer,
 Time, Void()* {

 @Override

 *public* Void call(JavaPairRDDString, Integer rdd, Time time) *throws*
 IOException {

 String counts = Counts at time  + time +   + rdd.collect();

 System.*out*.println(counts);

 System.*out*.println(Appending to  + outputFile.getAbsolutePath());

 Files.*append*(counts + \n, outputFile, Charset.*defaultCharset*());

 *return* *null*;

 }

 });

 wordCounts.saveAsHadoopFiles(outputPath, txt, Text.*class*, Text.*class*,
 TextOutputFormat.*class*);


 What do I need to check in namenode? I see 0 bytes files like this:


 drwxr-xr-x   - ec2-user supergroup  0 2015-08-13 15:45
 /tmp/out-1439495124000.txt
 drwxr-xr-x   - ec2-user supergroup  0 2015-08-13 15:45
 /tmp/out-1439495125000.txt
 drwxr-xr-x   - ec2-user supergroup  0 2015-08-13 15:45
 /tmp/out-1439495126000.txt
 drwxr-xr-x   - ec2-user supergroup  0 2015-08-13 15:45
 /tmp/out-1439495127000.txt
 drwxr-xr-x   - ec2-user supergroup  0 2015-08-13 15:45
 /tmp/out-1439495128000.txt



 However, I also wrote data to a local file on the local file system for
 verification and I see the data:


 $ ls -ltr !$
 ls -ltr /tmp/out
 -rw-r--r-- 1 yarn yarn 5230 Aug 13 15:45 /tmp/out


 On Fri, Aug 14, 2015 at 6:15 AM, Ted Yu yuzhih...@gmail.com wrote:

 Which Spark release are you using ?

 Can you show us snippet of your code ?

 Have you checked namenode log ?

 Thanks



 On Aug 13, 2015, at 10:21 PM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

 I was able to get this working by using an alternative method however I
 only see 0 bytes files in hadoop. I've verified that the output does exist
 in the logs however it's missing from hdfs.

 On Thu, Aug 13, 2015 at 10:49 AM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

 I have this call trying to save to hdfs 2.6

 wordCounts.saveAsNewAPIHadoopFiles(prefix, txt);

 but I am getting the following:
 java.lang.RuntimeException: class scala.runtime.Nothing$ not
 org.apache.hadoop.mapreduce.OutputFormat






Re: Spark RuntimeException hadoop output format

2015-08-14 Thread Mohit Anchlia
Spark 1.3

Code:

wordCounts.foreachRDD(*new* *Function2JavaPairRDDString, Integer, Time,
Void()* {

@Override

*public* Void call(JavaPairRDDString, Integer rdd, Time time) *throws*
IOException {

String counts = Counts at time  + time +   + rdd.collect();

System.*out*.println(counts);

System.*out*.println(Appending to  + outputFile.getAbsolutePath());

Files.*append*(counts + \n, outputFile, Charset.*defaultCharset*());

*return* *null*;

}

});

wordCounts.saveAsHadoopFiles(outputPath, txt, Text.*class*, Text.*class*,
TextOutputFormat.*class*);


What do I need to check in namenode? I see 0 bytes files like this:


drwxr-xr-x   - ec2-user supergroup  0 2015-08-13 15:45
/tmp/out-1439495124000.txt
drwxr-xr-x   - ec2-user supergroup  0 2015-08-13 15:45
/tmp/out-1439495125000.txt
drwxr-xr-x   - ec2-user supergroup  0 2015-08-13 15:45
/tmp/out-1439495126000.txt
drwxr-xr-x   - ec2-user supergroup  0 2015-08-13 15:45
/tmp/out-1439495127000.txt
drwxr-xr-x   - ec2-user supergroup  0 2015-08-13 15:45
/tmp/out-1439495128000.txt



However, I also wrote data to a local file on the local file system for
verification and I see the data:


$ ls -ltr !$
ls -ltr /tmp/out
-rw-r--r-- 1 yarn yarn 5230 Aug 13 15:45 /tmp/out


On Fri, Aug 14, 2015 at 6:15 AM, Ted Yu yuzhih...@gmail.com wrote:

 Which Spark release are you using ?

 Can you show us snippet of your code ?

 Have you checked namenode log ?

 Thanks



 On Aug 13, 2015, at 10:21 PM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

 I was able to get this working by using an alternative method however I
 only see 0 bytes files in hadoop. I've verified that the output does exist
 in the logs however it's missing from hdfs.

 On Thu, Aug 13, 2015 at 10:49 AM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

 I have this call trying to save to hdfs 2.6

 wordCounts.saveAsNewAPIHadoopFiles(prefix, txt);

 but I am getting the following:
 java.lang.RuntimeException: class scala.runtime.Nothing$ not
 org.apache.hadoop.mapreduce.OutputFormat





Re: Spark RuntimeException hadoop output format

2015-08-14 Thread Mohit Anchlia
I thought prefix meant the output path? What's the purpose of prefix and
where do I specify the path if not in prefix?

On Fri, Aug 14, 2015 at 4:36 PM, Ted Yu yuzhih...@gmail.com wrote:

 Please take a look at JavaPairDStream.scala:
  def saveAsHadoopFiles[F : OutputFormat[_, _]](
   prefix: String,
   suffix: String,
   keyClass: Class[_],
   valueClass: Class[_],
   outputFormatClass: Class[F]) {

 Did you intend to use outputPath as prefix ?

 Cheers


 On Fri, Aug 14, 2015 at 1:36 PM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

 Spark 1.3

 Code:

 wordCounts.foreachRDD(*new* *Function2JavaPairRDDString, Integer,
 Time, Void()* {

 @Override

 *public* Void call(JavaPairRDDString, Integer rdd, Time time) *throws*
 IOException {

 String counts = Counts at time  + time +   + rdd.collect();

 System.*out*.println(counts);

 System.*out*.println(Appending to  + outputFile.getAbsolutePath());

 Files.*append*(counts + \n, outputFile, Charset.*defaultCharset*());

 *return* *null*;

 }

 });

 wordCounts.saveAsHadoopFiles(outputPath, txt, Text.*class*, Text.
 *class*, TextOutputFormat.*class*);


 What do I need to check in namenode? I see 0 bytes files like this:


 drwxr-xr-x   - ec2-user supergroup  0 2015-08-13 15:45
 /tmp/out-1439495124000.txt
 drwxr-xr-x   - ec2-user supergroup  0 2015-08-13 15:45
 /tmp/out-1439495125000.txt
 drwxr-xr-x   - ec2-user supergroup  0 2015-08-13 15:45
 /tmp/out-1439495126000.txt
 drwxr-xr-x   - ec2-user supergroup  0 2015-08-13 15:45
 /tmp/out-1439495127000.txt
 drwxr-xr-x   - ec2-user supergroup  0 2015-08-13 15:45
 /tmp/out-1439495128000.txt



 However, I also wrote data to a local file on the local file system for
 verification and I see the data:


 $ ls -ltr !$
 ls -ltr /tmp/out
 -rw-r--r-- 1 yarn yarn 5230 Aug 13 15:45 /tmp/out


 On Fri, Aug 14, 2015 at 6:15 AM, Ted Yu yuzhih...@gmail.com wrote:

 Which Spark release are you using ?

 Can you show us snippet of your code ?

 Have you checked namenode log ?

 Thanks



 On Aug 13, 2015, at 10:21 PM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

 I was able to get this working by using an alternative method however I
 only see 0 bytes files in hadoop. I've verified that the output does exist
 in the logs however it's missing from hdfs.

 On Thu, Aug 13, 2015 at 10:49 AM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

 I have this call trying to save to hdfs 2.6

 wordCounts.saveAsNewAPIHadoopFiles(prefix, txt);

 but I am getting the following:
 java.lang.RuntimeException: class scala.runtime.Nothing$ not
 org.apache.hadoop.mapreduce.OutputFormat







Re: Spark RuntimeException hadoop output format

2015-08-14 Thread Ted Yu
Which Spark release are you using ?

Can you show us snippet of your code ?

Have you checked namenode log ?

Thanks



 On Aug 13, 2015, at 10:21 PM, Mohit Anchlia mohitanch...@gmail.com wrote:
 
 I was able to get this working by using an alternative method however I only 
 see 0 bytes files in hadoop. I've verified that the output does exist in the 
 logs however it's missing from hdfs.
 
 On Thu, Aug 13, 2015 at 10:49 AM, Mohit Anchlia mohitanch...@gmail.com 
 wrote:
 I have this call trying to save to hdfs 2.6
 wordCounts.saveAsNewAPIHadoopFiles(prefix, txt);
 
 but I am getting the following:
 
 java.lang.RuntimeException: class scala.runtime.Nothing$ not 
 org.apache.hadoop.mapreduce.OutputFormat
 


Spark RuntimeException hadoop output format

2015-08-13 Thread Mohit Anchlia
I have this call trying to save to hdfs 2.6

wordCounts.saveAsNewAPIHadoopFiles(prefix, txt);

but I am getting the following:
java.lang.RuntimeException: class scala.runtime.Nothing$ not
org.apache.hadoop.mapreduce.OutputFormat


Re: Spark RuntimeException hadoop output format

2015-08-13 Thread Mohit Anchlia
I was able to get this working by using an alternative method however I
only see 0 bytes files in hadoop. I've verified that the output does exist
in the logs however it's missing from hdfs.

On Thu, Aug 13, 2015 at 10:49 AM, Mohit Anchlia mohitanch...@gmail.com
wrote:

 I have this call trying to save to hdfs 2.6

 wordCounts.saveAsNewAPIHadoopFiles(prefix, txt);

 but I am getting the following:
 java.lang.RuntimeException: class scala.runtime.Nothing$ not
 org.apache.hadoop.mapreduce.OutputFormat