Re:Re: Spark SQL 1.3.1 "saveAsParquetFile" will output tachyon file with different block size

2015-04-28 Thread zhangxiongfei
HiActually I did not use Tachyon 0.6.3,just compiled it with 0.5.0 by 
make-distribution.sh. When I pulled the spark code from github,the Tachyon 
version was still 0.5.0 in pom,xml.

Regards
Zhang




At 2015-04-29 04:19:20, "sara mustafa"  wrote:
>Hi Zhang,
>
>How did you compile Spark 1.3.1 with Tachyon? when i changed Tachyon version
>to 0.6.3 in core/pom.xml, make-distribution.sh and try to compile again,
>many compilation errors raised.
>
>Thanks,
>
>
>
>
>--
>View this message in context: 
>http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-SQL-1-3-1-saveAsParquetFile-will-output-tachyon-file-with-different-block-size-tp11561p11870.html
>Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
>
>-
>To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>For additional commands, e-mail: dev-h...@spark.apache.org
>


Re: Spark SQL 1.3.1 "saveAsParquetFile" will output tachyon file with different block size

2015-04-28 Thread Calvin Jia
Hi,

You can apply this patch  and
recompile.

Hope this helps,
Calvin

On Tue, Apr 28, 2015 at 1:19 PM, sara mustafa 
wrote:

> Hi Zhang,
>
> How did you compile Spark 1.3.1 with Tachyon? when i changed Tachyon
> version
> to 0.6.3 in core/pom.xml, make-distribution.sh and try to compile again,
> many compilation errors raised.
>
> Thanks,
>
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-SQL-1-3-1-saveAsParquetFile-will-output-tachyon-file-with-different-block-size-tp11561p11870.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


Re: Spark SQL 1.3.1 "saveAsParquetFile" will output tachyon file with different block size

2015-04-28 Thread sara mustafa
Hi Zhang,

How did you compile Spark 1.3.1 with Tachyon? when i changed Tachyon version
to 0.6.3 in core/pom.xml, make-distribution.sh and try to compile again,
many compilation errors raised.

Thanks,




--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-SQL-1-3-1-saveAsParquetFile-will-output-tachyon-file-with-different-block-size-tp11561p11870.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



RE: [SQL][Feature] Access row by column name instead of index

2015-04-28 Thread Shuai Zheng
Hi, 

 

I add a few helper method on this, to make java developer life easier, because 
we can’t benefit from the generic feature on getAs[T].

 

Please let me know if I should not do that.

 

Regards,

 

Shuai

 

From: Michael Armbrust [mailto:mich...@databricks.com] 
Sent: Friday, April 24, 2015 5:57 PM
To: Reynold Xin
Cc: Shuai Zheng; dev
Subject: Re: [SQL][Feature] Access row by column name instead of index

 

Already done :)

 

https://github.com/apache/spark/commit/2e8c6ca47df14681c1110f0736234ce76a3eca9b

 

On Fri, Apr 24, 2015 at 2:37 PM, Reynold Xin  wrote:

Can you elaborate what you mean by that? (what's already available in
Python?)



On Fri, Apr 24, 2015 at 2:24 PM, Shuai Zheng  wrote:

> Hi All,
>
> I want to ask whether there is a plan to implement the feature to access
> the Row in sql by name? Currently we can only allow to access a row by
> index (there is a python version api of access by name, but none for java)
>
> Regards,
>
> Shuai
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>

 



SPARK_SUBMIT_CLASSPATH Windows PYSPARK

2015-04-28 Thread art1i
Hi,

So I was trying to get the Kafka Streaming working in a standalone python
application.

I needed to add the dependencies for this to work. The suggested way for
this is doing --jars using spark-submit, which is not practical considering
I wanted to launch and debug an application. Also you still have to set
-driver-class-path for this to work. 

So after searching around I found a JIRA, which said, that setting
SPARK_SUBMIT_CLASSPATH is workaround. This works on Linux, but on Windows
just setting the environmental var did not seem to work. So I looked into
spark-submit.cmd and it seems SPARK_SUBMIT_CLASSPATH is being set to:

set SPARK_SUBMIT_CLASSPATH= 

This is then overridden if  --driver-class-path is supplied. But  did not
want to supply the  argsument, I just wanted my enviromental variable to
persist like on Linux. So instead i changed the line

set SPARK_SUBMIT_CLASSPATH=%SPARK_SUBMIT_CLASSPATH%

Now everything works as expected and I can just inject the var in my python
script before importing  pypsark

os.environ["SPARK_SUBMIT_CLASSPATH"]="path1.jar;path2.jar"

So I was wondering if there is any reason this is not the default behavior
or I am just doing something wrong?

This is my first time posting, please excuse me if this is the wrong place
to post this.

Cheers,

Artyom
 
 



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/SPARK-SUBMIT-CLASSPATH-Windows-PYSPARK-tp11868.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



How to deploy self-build spark source code on EC2

2015-04-28 Thread Bo Fu
Hi all,

I have an issue. I added some timestamps in Spark source code and built it 
using:

mvn package -DskipTests

I checked the new version in my own computer and it works. However, when I ran 
spark on EC2, the spark code EC2 machines ran is the original version.

Anyone knows how to deploy the changed spark source code into EC2?
Thx a lot


Bo Fu

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re:Re: java.lang.StackOverflowError when recovery from checkpoint in Streaming

2015-04-28 Thread wyphao.2007
Hi Akhil Das, Thank you for your reply.
It is very similar to my problem, I will focus on it.
Thanks
Best Regards

At 2015-04-28 18:08:32,"Akhil Das"  wrote:
>There's a similar issue reported over here
>https://issues.apache.org/jira/browse/SPARK-6847
>
>Thanks
>Best Regards
>
>On Tue, Apr 28, 2015 at 7:35 AM, wyphao.2007  wrote:
>
>>  Hi everyone, I am using val messages =
>> KafkaUtils.createDirectStream[String, String, StringDecoder,
>> StringDecoder](ssc, kafkaParams, topicsSet) to read data from
>> kafka(1k/second), and store the data in windows,the code snippets as
>> follow:val windowedStreamChannel =
>> streamChannel.combineByKey[TreeSet[Obj]](TreeSet[Obj](_), _ += _, _ ++= _,
>> new HashPartitioner(numPartition))
>>   .reduceByKeyAndWindow((x: TreeSet[Obj], y: TreeSet[Obj]) => x
>> ++= y,
>> (x: TreeSet[Obj], y: TreeSet[Obj]) => x --= y, Minutes(60),
>> Seconds(2), numPartition,
>> (item: (String, TreeSet[Obj])) => item._2.size != 0)after the
>> application  run for an hour,  I kill the application and restart it from
>> checkpoint directory, but I  encountered an exception:2015-04-27
>> 17:52:40,955 INFO  [Driver] - Slicing from 1430126222000 ms to
>> 1430126222000 ms (aligned to 1430126222000 ms and 1430126222000 ms)
>> 2015-04-27 17:52:40,958 ERROR [Driver] - User class threw exception: null
>> java.lang.StackOverflowError
>> at java.io.UnixFileSystem.getBooleanAttributes0(Native Method)
>> at
>> java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:242)
>> at java.io.File.exists(File.java:813)
>> at
>> sun.misc.URLClassPath$FileLoader.getResource(URLClassPath.java:1080)
>> at sun.misc.URLClassPath.getResource(URLClassPath.java:199)
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:358)
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>> at java.lang.Class.forName0(Native Method)
>> at java.lang.Class.forName(Class.java:190)
>> at
>> org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
>> at org.apache.spark.SparkContext.clean(SparkContext.scala:1623)
>> at org.apache.spark.rdd.RDD.filter(RDD.scala:303)
>> at
>> org.apache.spark.streaming.dstream.FilteredDStream$$anonfun$compute$1.apply(FilteredDStream.scala:35)
>> at
>> org.apache.spark.streaming.dstream.FilteredDStream$$anonfun$compute$1.apply(FilteredDStream.scala:35)
>> at scala.Option.map(Option.scala:145)
>> at
>> org.apache.spark.streaming.dstream.FilteredDStream.compute(FilteredDStream.scala:35)
>> at
>> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300)
>> at
>> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300)
>> at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
>> at
>> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:299)
>> at
>> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:287)
>> at scala.Option.orElse(Option.scala:257)
>> at
>> org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:284)
>> at
>> org.apache.spark.streaming.dstream.FlatMappedDStream.compute(FlatMappedDStream.scala:35)
>> at
>> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300)
>> at
>> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300)
>> at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
>> at
>> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:299)
>> at
>> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:287)
>> at scala.Option.orElse(Option.scala:257)
>> at
>> org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:284)
>> at
>> org.apache.spark.streaming.dstream.FilteredDStream.compute(FilteredDStream.scala:35)
>> at
>> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300)
>> at
>> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300)
>> at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
>> at
>> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:299)
>> at
>> org.apache.

Re: java.lang.StackOverflowError when recovery from checkpoint in Streaming

2015-04-28 Thread Akhil Das
There's a similar issue reported over here
https://issues.apache.org/jira/browse/SPARK-6847

Thanks
Best Regards

On Tue, Apr 28, 2015 at 7:35 AM, wyphao.2007  wrote:

>  Hi everyone, I am using val messages =
> KafkaUtils.createDirectStream[String, String, StringDecoder,
> StringDecoder](ssc, kafkaParams, topicsSet) to read data from
> kafka(1k/second), and store the data in windows,the code snippets as
> follow:val windowedStreamChannel =
> streamChannel.combineByKey[TreeSet[Obj]](TreeSet[Obj](_), _ += _, _ ++= _,
> new HashPartitioner(numPartition))
>   .reduceByKeyAndWindow((x: TreeSet[Obj], y: TreeSet[Obj]) => x
> ++= y,
> (x: TreeSet[Obj], y: TreeSet[Obj]) => x --= y, Minutes(60),
> Seconds(2), numPartition,
> (item: (String, TreeSet[Obj])) => item._2.size != 0)after the
> application  run for an hour,  I kill the application and restart it from
> checkpoint directory, but I  encountered an exception:2015-04-27
> 17:52:40,955 INFO  [Driver] - Slicing from 1430126222000 ms to
> 1430126222000 ms (aligned to 1430126222000 ms and 1430126222000 ms)
> 2015-04-27 17:52:40,958 ERROR [Driver] - User class threw exception: null
> java.lang.StackOverflowError
> at java.io.UnixFileSystem.getBooleanAttributes0(Native Method)
> at
> java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:242)
> at java.io.File.exists(File.java:813)
> at
> sun.misc.URLClassPath$FileLoader.getResource(URLClassPath.java:1080)
> at sun.misc.URLClassPath.getResource(URLClassPath.java:199)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:358)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:190)
> at
> org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
> at org.apache.spark.SparkContext.clean(SparkContext.scala:1623)
> at org.apache.spark.rdd.RDD.filter(RDD.scala:303)
> at
> org.apache.spark.streaming.dstream.FilteredDStream$$anonfun$compute$1.apply(FilteredDStream.scala:35)
> at
> org.apache.spark.streaming.dstream.FilteredDStream$$anonfun$compute$1.apply(FilteredDStream.scala:35)
> at scala.Option.map(Option.scala:145)
> at
> org.apache.spark.streaming.dstream.FilteredDStream.compute(FilteredDStream.scala:35)
> at
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300)
> at
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300)
> at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
> at
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:299)
> at
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:287)
> at scala.Option.orElse(Option.scala:257)
> at
> org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:284)
> at
> org.apache.spark.streaming.dstream.FlatMappedDStream.compute(FlatMappedDStream.scala:35)
> at
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300)
> at
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300)
> at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
> at
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:299)
> at
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:287)
> at scala.Option.orElse(Option.scala:257)
> at
> org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:284)
> at
> org.apache.spark.streaming.dstream.FilteredDStream.compute(FilteredDStream.scala:35)
> at
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300)
> at
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300)
> at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
> at
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:299)
> at
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:287)
> at scala.Option.orElse(Option.scala:257)
> at
> org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:284)
> at
> org.apache.

Re: creating hive packages for spark

2015-04-28 Thread Manku Timma
Yash,
This is exactly what I wanted! Thanks a bunch.

On 27 April 2015 at 15:39, yash datta  wrote:

> Hi,
>
> you can build spark-project hive from here :
>
> https://github.com/pwendell/hive/tree/0.13.1-shaded-protobuf
>
> Hope this helps.
>
>
> On Mon, Apr 27, 2015 at 3:23 PM, Manku Timma 
> wrote:
>
>> Hello Spark developers,
>> I want to understand the procedure to create the org.spark-project.hive
>> jars. Is this documented somewhere? I am having issues with
>> -Phive-provided
>> with my private hive13 jars and want to check if using spark's procedure
>> helps.
>>
>
>
>
> --
> When events unfold with calm and ease
> When the winds that blow are merely breeze
> Learn from nature, from birds and bees
> Live your life in love, and let joy not cease.
>