Re: Spark RuntimeException hadoop output format
First you create the file: final File outputFile = new File(outputPath); Then you write to it: Files.append(counts + \n, outputFile, Charset.defaultCharset()); Cheers On Fri, Aug 14, 2015 at 4:38 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I thought prefix meant the output path? What's the purpose of prefix and where do I specify the path if not in prefix? On Fri, Aug 14, 2015 at 4:36 PM, Ted Yu yuzhih...@gmail.com wrote: Please take a look at JavaPairDStream.scala: def saveAsHadoopFiles[F : OutputFormat[_, _]]( prefix: String, suffix: String, keyClass: Class[_], valueClass: Class[_], outputFormatClass: Class[F]) { Did you intend to use outputPath as prefix ? Cheers On Fri, Aug 14, 2015 at 1:36 PM, Mohit Anchlia mohitanch...@gmail.com wrote: Spark 1.3 Code: wordCounts.foreachRDD(*new* *Function2JavaPairRDDString, Integer, Time, Void()* { @Override *public* Void call(JavaPairRDDString, Integer rdd, Time time) *throws* IOException { String counts = Counts at time + time + + rdd.collect(); System.*out*.println(counts); System.*out*.println(Appending to + outputFile.getAbsolutePath()); Files.*append*(counts + \n, outputFile, Charset.*defaultCharset*()); *return* *null*; } }); wordCounts.saveAsHadoopFiles(outputPath, txt, Text.*class*, Text. *class*, TextOutputFormat.*class*); What do I need to check in namenode? I see 0 bytes files like this: drwxr-xr-x - ec2-user supergroup 0 2015-08-13 15:45 /tmp/out-1439495124000.txt drwxr-xr-x - ec2-user supergroup 0 2015-08-13 15:45 /tmp/out-1439495125000.txt drwxr-xr-x - ec2-user supergroup 0 2015-08-13 15:45 /tmp/out-1439495126000.txt drwxr-xr-x - ec2-user supergroup 0 2015-08-13 15:45 /tmp/out-1439495127000.txt drwxr-xr-x - ec2-user supergroup 0 2015-08-13 15:45 /tmp/out-1439495128000.txt However, I also wrote data to a local file on the local file system for verification and I see the data: $ ls -ltr !$ ls -ltr /tmp/out -rw-r--r-- 1 yarn yarn 5230 Aug 13 15:45 /tmp/out On Fri, Aug 14, 2015 at 6:15 AM, Ted Yu yuzhih...@gmail.com wrote: Which Spark release are you using ? Can you show us snippet of your code ? Have you checked namenode log ? Thanks On Aug 13, 2015, at 10:21 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I was able to get this working by using an alternative method however I only see 0 bytes files in hadoop. I've verified that the output does exist in the logs however it's missing from hdfs. On Thu, Aug 13, 2015 at 10:49 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I have this call trying to save to hdfs 2.6 wordCounts.saveAsNewAPIHadoopFiles(prefix, txt); but I am getting the following: java.lang.RuntimeException: class scala.runtime.Nothing$ not org.apache.hadoop.mapreduce.OutputFormat
Re: Spark RuntimeException hadoop output format
Please take a look at JavaPairDStream.scala: def saveAsHadoopFiles[F : OutputFormat[_, _]]( prefix: String, suffix: String, keyClass: Class[_], valueClass: Class[_], outputFormatClass: Class[F]) { Did you intend to use outputPath as prefix ? Cheers On Fri, Aug 14, 2015 at 1:36 PM, Mohit Anchlia mohitanch...@gmail.com wrote: Spark 1.3 Code: wordCounts.foreachRDD(*new* *Function2JavaPairRDDString, Integer, Time, Void()* { @Override *public* Void call(JavaPairRDDString, Integer rdd, Time time) *throws* IOException { String counts = Counts at time + time + + rdd.collect(); System.*out*.println(counts); System.*out*.println(Appending to + outputFile.getAbsolutePath()); Files.*append*(counts + \n, outputFile, Charset.*defaultCharset*()); *return* *null*; } }); wordCounts.saveAsHadoopFiles(outputPath, txt, Text.*class*, Text.*class*, TextOutputFormat.*class*); What do I need to check in namenode? I see 0 bytes files like this: drwxr-xr-x - ec2-user supergroup 0 2015-08-13 15:45 /tmp/out-1439495124000.txt drwxr-xr-x - ec2-user supergroup 0 2015-08-13 15:45 /tmp/out-1439495125000.txt drwxr-xr-x - ec2-user supergroup 0 2015-08-13 15:45 /tmp/out-1439495126000.txt drwxr-xr-x - ec2-user supergroup 0 2015-08-13 15:45 /tmp/out-1439495127000.txt drwxr-xr-x - ec2-user supergroup 0 2015-08-13 15:45 /tmp/out-1439495128000.txt However, I also wrote data to a local file on the local file system for verification and I see the data: $ ls -ltr !$ ls -ltr /tmp/out -rw-r--r-- 1 yarn yarn 5230 Aug 13 15:45 /tmp/out On Fri, Aug 14, 2015 at 6:15 AM, Ted Yu yuzhih...@gmail.com wrote: Which Spark release are you using ? Can you show us snippet of your code ? Have you checked namenode log ? Thanks On Aug 13, 2015, at 10:21 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I was able to get this working by using an alternative method however I only see 0 bytes files in hadoop. I've verified that the output does exist in the logs however it's missing from hdfs. On Thu, Aug 13, 2015 at 10:49 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I have this call trying to save to hdfs 2.6 wordCounts.saveAsNewAPIHadoopFiles(prefix, txt); but I am getting the following: java.lang.RuntimeException: class scala.runtime.Nothing$ not org.apache.hadoop.mapreduce.OutputFormat
Re: Spark RuntimeException hadoop output format
Spark 1.3 Code: wordCounts.foreachRDD(*new* *Function2JavaPairRDDString, Integer, Time, Void()* { @Override *public* Void call(JavaPairRDDString, Integer rdd, Time time) *throws* IOException { String counts = Counts at time + time + + rdd.collect(); System.*out*.println(counts); System.*out*.println(Appending to + outputFile.getAbsolutePath()); Files.*append*(counts + \n, outputFile, Charset.*defaultCharset*()); *return* *null*; } }); wordCounts.saveAsHadoopFiles(outputPath, txt, Text.*class*, Text.*class*, TextOutputFormat.*class*); What do I need to check in namenode? I see 0 bytes files like this: drwxr-xr-x - ec2-user supergroup 0 2015-08-13 15:45 /tmp/out-1439495124000.txt drwxr-xr-x - ec2-user supergroup 0 2015-08-13 15:45 /tmp/out-1439495125000.txt drwxr-xr-x - ec2-user supergroup 0 2015-08-13 15:45 /tmp/out-1439495126000.txt drwxr-xr-x - ec2-user supergroup 0 2015-08-13 15:45 /tmp/out-1439495127000.txt drwxr-xr-x - ec2-user supergroup 0 2015-08-13 15:45 /tmp/out-1439495128000.txt However, I also wrote data to a local file on the local file system for verification and I see the data: $ ls -ltr !$ ls -ltr /tmp/out -rw-r--r-- 1 yarn yarn 5230 Aug 13 15:45 /tmp/out On Fri, Aug 14, 2015 at 6:15 AM, Ted Yu yuzhih...@gmail.com wrote: Which Spark release are you using ? Can you show us snippet of your code ? Have you checked namenode log ? Thanks On Aug 13, 2015, at 10:21 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I was able to get this working by using an alternative method however I only see 0 bytes files in hadoop. I've verified that the output does exist in the logs however it's missing from hdfs. On Thu, Aug 13, 2015 at 10:49 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I have this call trying to save to hdfs 2.6 wordCounts.saveAsNewAPIHadoopFiles(prefix, txt); but I am getting the following: java.lang.RuntimeException: class scala.runtime.Nothing$ not org.apache.hadoop.mapreduce.OutputFormat
Re: Spark RuntimeException hadoop output format
I thought prefix meant the output path? What's the purpose of prefix and where do I specify the path if not in prefix? On Fri, Aug 14, 2015 at 4:36 PM, Ted Yu yuzhih...@gmail.com wrote: Please take a look at JavaPairDStream.scala: def saveAsHadoopFiles[F : OutputFormat[_, _]]( prefix: String, suffix: String, keyClass: Class[_], valueClass: Class[_], outputFormatClass: Class[F]) { Did you intend to use outputPath as prefix ? Cheers On Fri, Aug 14, 2015 at 1:36 PM, Mohit Anchlia mohitanch...@gmail.com wrote: Spark 1.3 Code: wordCounts.foreachRDD(*new* *Function2JavaPairRDDString, Integer, Time, Void()* { @Override *public* Void call(JavaPairRDDString, Integer rdd, Time time) *throws* IOException { String counts = Counts at time + time + + rdd.collect(); System.*out*.println(counts); System.*out*.println(Appending to + outputFile.getAbsolutePath()); Files.*append*(counts + \n, outputFile, Charset.*defaultCharset*()); *return* *null*; } }); wordCounts.saveAsHadoopFiles(outputPath, txt, Text.*class*, Text. *class*, TextOutputFormat.*class*); What do I need to check in namenode? I see 0 bytes files like this: drwxr-xr-x - ec2-user supergroup 0 2015-08-13 15:45 /tmp/out-1439495124000.txt drwxr-xr-x - ec2-user supergroup 0 2015-08-13 15:45 /tmp/out-1439495125000.txt drwxr-xr-x - ec2-user supergroup 0 2015-08-13 15:45 /tmp/out-1439495126000.txt drwxr-xr-x - ec2-user supergroup 0 2015-08-13 15:45 /tmp/out-1439495127000.txt drwxr-xr-x - ec2-user supergroup 0 2015-08-13 15:45 /tmp/out-1439495128000.txt However, I also wrote data to a local file on the local file system for verification and I see the data: $ ls -ltr !$ ls -ltr /tmp/out -rw-r--r-- 1 yarn yarn 5230 Aug 13 15:45 /tmp/out On Fri, Aug 14, 2015 at 6:15 AM, Ted Yu yuzhih...@gmail.com wrote: Which Spark release are you using ? Can you show us snippet of your code ? Have you checked namenode log ? Thanks On Aug 13, 2015, at 10:21 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I was able to get this working by using an alternative method however I only see 0 bytes files in hadoop. I've verified that the output does exist in the logs however it's missing from hdfs. On Thu, Aug 13, 2015 at 10:49 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I have this call trying to save to hdfs 2.6 wordCounts.saveAsNewAPIHadoopFiles(prefix, txt); but I am getting the following: java.lang.RuntimeException: class scala.runtime.Nothing$ not org.apache.hadoop.mapreduce.OutputFormat
Re: Spark RuntimeException hadoop output format
Which Spark release are you using ? Can you show us snippet of your code ? Have you checked namenode log ? Thanks On Aug 13, 2015, at 10:21 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I was able to get this working by using an alternative method however I only see 0 bytes files in hadoop. I've verified that the output does exist in the logs however it's missing from hdfs. On Thu, Aug 13, 2015 at 10:49 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I have this call trying to save to hdfs 2.6 wordCounts.saveAsNewAPIHadoopFiles(prefix, txt); but I am getting the following: java.lang.RuntimeException: class scala.runtime.Nothing$ not org.apache.hadoop.mapreduce.OutputFormat
Spark RuntimeException hadoop output format
I have this call trying to save to hdfs 2.6 wordCounts.saveAsNewAPIHadoopFiles(prefix, txt); but I am getting the following: java.lang.RuntimeException: class scala.runtime.Nothing$ not org.apache.hadoop.mapreduce.OutputFormat
Re: Spark RuntimeException hadoop output format
I was able to get this working by using an alternative method however I only see 0 bytes files in hadoop. I've verified that the output does exist in the logs however it's missing from hdfs. On Thu, Aug 13, 2015 at 10:49 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I have this call trying to save to hdfs 2.6 wordCounts.saveAsNewAPIHadoopFiles(prefix, txt); but I am getting the following: java.lang.RuntimeException: class scala.runtime.Nothing$ not org.apache.hadoop.mapreduce.OutputFormat