from:"mayankshete"

Hive 1.0.0 not able to read Spark 1.6.1 parquet output files on EMR 4.7.0

2016-06-13 Thread mayankshete

Hello Team ,

I am facing an issue where output files generated by Spark 1.6.1 are not
read by Hive 1.0.0 . It is because Hive 1.0.0 uses older parquet version
than Spark 1.6.1 which is using 1.7.0 parquet .

Is it possible that we can use older parquet version in Spark or newer
parquet version in Hive ?
I have tried adding parquet-hive-bundle : 1.7.0 to Hive but while reading it
throws Failed with exception
java.io.IOException:java.lang.NullPointerException . 

Can anyone give us the solution ?

Thanks ,
Mayank



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Hive-1-0-0-not-able-to-read-Spark-1-6-1-parquet-output-files-on-EMR-4-7-0-tp27144.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Spark Twitter Stream throws Null Pointer Exception

2016-06-01 Thread mayankshete

  Hello Team,
  
  Can anyone tell why the below code is throwing Null Pointer Execption in
yarn-client mode whereas running on local mode. 

 / val filters = args.takeRight(0)
  val sparkConf = new SparkConf().setAppName("TwitterAnalyzer")
  val ssc = new StreamingContext(sparkConf, Seconds(2))
  val stream = TwitterUtils.createStream(ssc, None, filters)
  val training = ssc.textFileStream("/user/hadoop/Training")
  val tf = new HashingTF(numFeatures = 140)
  val text = stream.filter(x => x != null ).filter(x => x.getLang() == "en"
).map( x =>  x.getText).filter(tweet => tweet != null).map(tweet =>
tf.transform(tweet.split(" ")))/

Here is the stacktrace of the error ( Program is the user class ) :

 java.lang.NullPointerException
at com.Program$$anonfun$5.apply(Program.scala:40)
at com.Program$$anonfun$5.apply(Program.scala:40)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at
scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at
org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$28.apply(RDD.scala:1328)
at
org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$28.apply(RDD.scala:1328)
at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Twitter-Stream-throws-Null-Pointer-Exception-tp27060.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

ANOVA test in Spark

2016-05-13 Thread mayankshete

Is ANOVA present in Spark Mllib if not then, when will be this feature be
available in Spark ?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/ANOVA-test-in-Spark-tp26949.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Create multiple output files from Thriftserver

2016-04-28 Thread mayankshete

Is there a way to create multiple output files when connected from beeline to
the Thriftserver ?
Right now i am using beeline -e 'query' > output.txt which is not efficient
as it uses linux operator to combine output files .



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Create-multiple-output-files-from-Thriftserver-tp26845.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Hive 1.0.0 not able to read Spark 1.6.1 parquet output files on EMR 4.7.0

Spark Twitter Stream throws Null Pointer Exception

ANOVA test in Spark

Create multiple output files from Thriftserver

4 matches

Site Navigation

Mail list logo

Footer information