[jira] [Comment Edited] (SPARK-4103) Clean up SessionState in HiveContext

2014-10-27 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185889#comment-14185889 ] Zhan Zhang edited comment on SPARK-4103 at 10/27/14 11:10 PM

[jira] [Commented] (SPARK-2706) Enable Spark to support Hive 0.13

2014-10-24 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183452#comment-14183452 ] Zhan Zhang commented on SPARK-2706: --- I just check the trunk, the change is already

[jira] [Commented] (SPARK-2706) Enable Spark to support Hive 0.13

2014-10-24 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183493#comment-14183493 ] Zhan Zhang commented on SPARK-2706: --- Michael, Please ignore my last email. I thought

[jira] [Commented] (SPARK-2706) Enable Spark to support Hive 0.13

2014-10-16 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174073#comment-14174073 ] Zhan Zhang commented on SPARK-2706: --- The code does not go to upstream yet. To build

[jira] [Updated] (SPARK-3720) support ORC in spark sql

2014-10-15 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhan Zhang updated SPARK-3720: -- Attachment: orc.diff This is the diff for orc file support. Because I also work on this item

[jira] [Updated] (SPARK-2883) Spark Support for ORCFile format

2014-10-15 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhan Zhang updated SPARK-2883: -- Attachment: orc.diff Just for completion, I also attach the diff here besides spark-3720. Spark

[jira] [Commented] (SPARK-2883) Spark Support for ORCFile format

2014-10-15 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14173269#comment-14173269 ] Zhan Zhang commented on SPARK-2883: --- It is scala patch, and I am not very familiar

[jira] [Comment Edited] (SPARK-2883) Spark Support for ORCFile format

2014-10-14 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171654#comment-14171654 ] Zhan Zhang edited comment on SPARK-2883 at 10/14/14 10:41 PM

[jira] [Commented] (SPARK-2883) Spark Support for ORCFile format

2014-10-14 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171654#comment-14171654 ] Zhan Zhang commented on SPARK-2883: --- I almost finished the prototype, and following

[jira] [Comment Edited] (SPARK-2883) Spark Support for ORCFile format

2014-10-14 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171654#comment-14171654 ] Zhan Zhang edited comment on SPARK-2883 at 10/14/14 10:44 PM

[jira] [Comment Edited] (SPARK-2883) Spark Support for ORCFile format

2014-10-14 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171654#comment-14171654 ] Zhan Zhang edited comment on SPARK-2883 at 10/14/14 11:04 PM

[jira] [Comment Edited] (SPARK-2883) Spark Support for ORCFile format

2014-10-14 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171654#comment-14171654 ] Zhan Zhang edited comment on SPARK-2883 at 10/14/14 11:59 PM

[jira] [Commented] (SPARK-3720) support ORC in spark sql

2014-10-08 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14164222#comment-14164222 ] Zhan Zhang commented on SPARK-3720: --- There is another gira spark-2883 opened on 06/Aug

[jira] [Commented] (SPARK-3720) support ORC in spark sql

2014-10-08 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14164224#comment-14164224 ] Zhan Zhang commented on SPARK-3720: --- By the way, I don't think the current approach can

[jira] [Comment Edited] (SPARK-3720) support ORC in spark sql

2014-10-08 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14164222#comment-14164222 ] Zhan Zhang edited comment on SPARK-3720 at 10/8/14 9:45 PM

[jira] [Comment Edited] (SPARK-3720) support ORC in spark sql

2014-10-08 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14164224#comment-14164224 ] Zhan Zhang edited comment on SPARK-3720 at 10/8/14 9:47 PM

[jira] [Commented] (SPARK-3633) Fetches failure observed after SPARK-2711

2014-09-25 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14147939#comment-14147939 ] Zhan Zhang commented on SPARK-3633: --- Increasing timeout does not help my case either. I

Re: sortByKey trouble

2014-09-24 Thread Zhan Zhang
Try this Import org.apache.spark.SparkContext._ Thanks. Zhan Zhang On Sep 24, 2014, at 6:13 AM, david david...@free.fr wrote: thank's i've already try this solution but it does not compile (in Eclipse) I'm surprise to see that in Spark-shell, sortByKey works fine on 2 solutions

Re: Converting one RDD to another

2014-09-23 Thread Zhan Zhang
Here is my understanding def takeOrdered(num: Int)(implicit ord: Ordering[T]): Array[T] = { if (num == 0) { //if 0, return empty array Array.empty } else { mapPartitions { items = //map each partition to a a new one with the iterator consists of the single queue,

Re: how long does it take executing ./sbt/sbt assembly

2014-09-23 Thread Zhan Zhang
Definitely something wrong. For me, 10 to 30 minutes. Thanks. Zhan Zhang On Sep 23, 2014, at 10:02 PM, christy 760948...@qq.com wrote: This process began yesterday and it has already run for more than 20 hours. Is it normal? Any one has the same problem? No error throw out yet

[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server

2014-09-18 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14139396#comment-14139396 ] Zhan Zhang commented on SPARK-1537: --- Do you have any update on this, or any schedule

[jira] [Commented] (SPARK-2883) Spark Support for ORCFile format

2014-09-17 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137759#comment-14137759 ] Zhan Zhang commented on SPARK-2883: --- I am starting to prototyping the last feature

[jira] [Commented] (SPARK-2883) Spark Support for ORCFile format

2014-09-16 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136102#comment-14136102 ] Zhan Zhang commented on SPARK-2883: --- There are several features to be supported. 1st

[jira] [Commented] (SPARK-2883) Spark Support for ORCFile format

2014-09-16 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136481#comment-14136481 ] Zhan Zhang commented on SPARK-2883: --- Sorry, I mean column pruning. Currently what I see

[jira] [Issue Comment Deleted] (SPARK-2883) Spark Support for ORCFile format

2014-09-16 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhan Zhang updated SPARK-2883: -- Comment: was deleted (was: Sorry, I mean column pruning. Currently what I see is that HiveTableScan

Re: spark RDD join Error

2014-09-04 Thread Zhan Zhang
Try this: Import org.apache.spark.SparkContext._ Thanks. Zhan Zhang On Sep 4, 2014, at 4:36 PM, Veeranagouda Mukkanagoudar veera...@gmail.com wrote: I am planning to use RDD join operation, to test out i was trying to compile some test code, but am getting following compilation error

Re: Running Wordcount on large file stucks and throws OOM exception

2014-09-03 Thread Zhan Zhang
://sandbox.hortonworks.com:8020/tmp/wordcount) Thanks. Zhan Zhang On Aug 26, 2014, at 12:35 AM, motte1988 wir12...@studserv.uni-leipzig.de wrote: Hello, it's me again. Now I've got an explanation for the behaviour. It seems that the driver memory is not large enough to hold the whole result set

[jira] [Commented] (SPARK-2706) Enable Spark to support Hive 0.13

2014-09-02 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14118979#comment-14118979 ] Zhan Zhang commented on SPARK-2706: --- send out pull request https://github.com/apache

RE: Working Formula for Hive 0.13?

2014-08-29 Thread Zhan Zhang
issue to spark-2706 soon. Thanks. Zhan Zhang -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Working-Formula-for-Hive-0-13-tp7551p8118.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com

[jira] [Updated] (SPARK-2706) Enable Spark to support Hive 0.13

2014-08-27 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhan Zhang updated SPARK-2706: -- Attachment: v1.0.2.diff This is the patch against v1.0.2. I didn't fix the test cases. The regular

Re: Configuration for big worker nodes

2014-08-22 Thread Zhan Zhang
I think it depends on your job. My personal experiences when I run TB data. spark got loss connection failure if I use big JVM with large memory, but with more executors with small memory, it can run very smoothly. I was running spark on yarn. Thanks. Zhan Zhang On Aug 21, 2014, at 3:42 PM

[jira] [Commented] (HIVE-4523) round() function with specified decimal places not consistent with mysql

2014-08-21 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HIVE-4523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105103#comment-14105103 ] Zhan Zhang commented on HIVE-4523: -- I also met the same problem with the new round UDF

[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server

2014-08-20 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104881#comment-14104881 ] Zhan Zhang commented on SPARK-1537: --- Thanks for sharing this. Do you have concrete plan

Re: Web UI doesn't show some stages

2014-08-20 Thread Zhan Zhang
the reduceByKey because it is not cached. I agree with you it is very confusing. Thanks. Zhan Zhang The f On Aug 20, 2014, at 2:28 PM, Patrick Wendell pwend...@gmail.com wrote: The reason is that some operators get pipelined into a single stage. rdd.map(XX).filter(YY) - this executes in a single

Re: spark.akka.frameSize stalls job in 1.1.0

2014-08-18 Thread Zhan Zhang
( )).map(word = (word, 1)).reduceByKey((a, b) = a + b) counts.saveAsTextFile(“file”)//any way you don’t want to collect results to master, and instead putting them in file. Thanks. Zhan Zhang On Aug 16, 2014, at 9:18 AM, Jerry Ye jerr...@gmail.com wrote: The job ended up running overnight

Re: spark.akka.frameSize stalls job in 1.1.0

2014-08-18 Thread Zhan Zhang
Not sure exactly how you use it. My understanding is that in spark it would be better to keep the overhead of driver as less as possible. Is it possible to broadcast trie to executors, do computation there and then aggregate the counters (??) in reduct phase? Thanks. Zhan Zhang On Aug 18

Re: NullPointerException when connecting from Spark to a Hive table backed by HBase

2014-08-18 Thread Zhan Zhang
String HBASE_TABLE_NAME = hbase.table.name”; Thanks. Zhan Zhang On Aug 17, 2014, at 11:39 PM, Cesar Arevalo ce...@zephyrhealthinc.com wrote: HadoopRDD -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain

Re: Bug or feature? Overwrite broadcasted variables.

2014-08-18 Thread Zhan Zhang
. Zhan Zhang On Aug 18, 2014, at 11:26 AM, Peng Cheng pc...@uow.edu.au wrote: I'm curious to see that if you declare broadcasted wrapper as a var, and overwrite it in the driver program, the modification can have stable impact on all transformations/actions defined BEFORE the overwrite

[jira] [Commented] (SPARK-2883) Spark Support for ORCFile format

2014-08-15 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099275#comment-14099275 ] Zhan Zhang commented on SPARK-2883: --- Spark with Hive12 can operate Orc table through

[jira] [Created] (HIVE-7743) appendReadColumns should have sanity check

2014-08-15 Thread Zhan Zhang (JIRA)
Zhan Zhang created HIVE-7743: Summary: appendReadColumns should have sanity check Key: HIVE-7743 URL: https://issues.apache.org/jira/browse/HIVE-7743 Project: Hive Issue Type: Bug

Re: Support for ORC Table in Shark/Spark

2014-08-14 Thread Zhan Zhang
I tried with simple spark-hive select and insert, and it works. But to directly manipulate the ORCFile through RDD, spark has to be upgraded to support hive-0.13 first. Because some ORC API is not exposed until Hive-0.12. Thanks. Zhan Zhang On Aug 11, 2014, at 10:23 PM, vinay.kash

Re: Support for ORC Table in Shark/Spark

2014-08-14 Thread Zhan Zhang
Yes. You are right, but I tried old hadoopFile for OrcInputFormat. In hive12, OrcStruct is not exposing its api, so spark cannot access it. With Hive13, RDD can read from OrcFile. Btw, I didn’t see ORCNewOutputFormat in hive-0.13. Direct RDD manipulation (Hive13) val inputRead =

Re: Support for ORC Table in Shark/Spark

2014-08-14 Thread Zhan Zhang
I agree. We need the support similar to parquet file for end user. That’s the purpose of Spark-2883. Thanks. Zhan Zhang On Aug 14, 2014, at 11:42 AM, Yin Huai huaiyin@gmail.com wrote: I feel that using hadoopFile and saveAsHadoopFile to read and write ORCFile are more towards

[jira] [Commented] (SPARK-2706) Enable Spark to support Hive 0.13

2014-08-12 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094810#comment-14094810 ] Zhan Zhang commented on SPARK-2706: --- It may caused by hive side change, in the TestHive

Re: Spark testsuite error for hive 0.13.

2014-08-12 Thread Zhan Zhang
Problem solved by a walkaround with create database and use database. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-testsuite-error-for-hive-0-13-tp7807p7819.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Spark testsuite error for hive 0.13.

2014-08-11 Thread Zhan Zhang
I am trying to change spark to support hive-0.13, but always met following problem when running the test. My feeling is the test setup may need to change, but don't know exactly. Who has the similar issue or is able to shed light on it? 13:50:53.331 ERROR org.apache.hadoop.hive.ql.Driver: FAILED:

Re: Spark testsuite error for hive 0.13.

2014-08-11 Thread Zhan Zhang
Thanks Sean, I change both the API and version because there are some incompatibility with hive-0.13, and actually can do some basic operation with the real hive environment. But the test suite always complain with no default database message. No clue yet. -- View this message in context:

[jira] [Updated] (SPARK-2706) Enable Spark to support Hive 0.13

2014-08-08 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhan Zhang updated SPARK-2706: -- Attachment: hive.diff mvn -Phive -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean package

[jira] [Updated] (SPARK-2706) Enable Spark to support Hive 0.13

2014-08-08 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhan Zhang updated SPARK-2706: -- Attachment: (was: hive.diff) Enable Spark to support Hive 0.13

[jira] [Updated] (SPARK-2706) Enable Spark to support Hive 0.13

2014-08-08 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhan Zhang updated SPARK-2706: -- Attachment: hive.diff Patch to the latest spark trunk. I only test with following compilation mvn

[jira] [Issue Comment Deleted] (SPARK-2706) Enable Spark to support Hive 0.13

2014-08-08 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhan Zhang updated SPARK-2706: -- Comment: was deleted (was: mvn -Phive -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean

Re: Working Formula for Hive 0.13?

2014-08-08 Thread Zhan Zhang
The API change seems not major. I have locally change it and compiled, but not test yet. The major problem is still how to solve the hive-exec jar dependency. I am willing to help on this issue. Is it better stick to the same way as hive-0.12 until hive-exec is cleaned enough to switch back? --

Re: Working Formula for Hive 0.13?

2014-08-08 Thread Zhan Zhang
I can compile with no error, but my patch also includes other stuff. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Working-Formula-for-Hive-0-13-tp7551p7775.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: Working Formula for Hive 0.13?

2014-08-08 Thread Zhan Zhang
Here is the patch. Please ignore the pom.xml related change, which just for compiling purpose. I need to further work on this one based on Wandou's previous work. -- View this message in context:

Re: Working Formula for Hive 0.13?

2014-08-08 Thread Zhan Zhang
Sorry, forget to upload files. I have never posted before :) hive.diff http://apache-spark-developers-list.1001551.n3.nabble.com/file/n/hive.diff -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Working-Formula-for-Hive-0-13-tp7551p.html

Re: Working Formula for Hive 0.13?

2014-08-08 Thread Zhan Zhang
Attached the diff the PR SPARK-2706. I am currently working on this problem. If somebody are also working on this, we can share the load. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Working-Formula-for-Hive-0-13-tp7551p7782.html Sent from the

[jira] [Created] (SPARK-2883) Spark Support for ORCFile format

2014-08-06 Thread Zhan Zhang (JIRA)
Zhan Zhang created SPARK-2883: - Summary: Spark Support for ORCFile format Key: SPARK-2883 URL: https://issues.apache.org/jira/browse/SPARK-2883 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server

2014-08-05 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14086657#comment-14086657 ] Zhan Zhang commented on SPARK-1537: --- I am also interested in it and trying to integrate

[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server

2014-08-05 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14086808#comment-14086808 ] Zhan Zhang commented on SPARK-1537: --- Do you mind sharing your thoughts, design document

Re: Spark REPL question

2014-04-17 Thread Zhan Zhang
Thanks a lot. By spins up, do you mean using the same directory, specified by following? /** Local directory to save .class files too */ val outputDir = { val tmp = System.getProperty(java.io.tmpdir) val rootDir = new SparkConf().get(spark.repl.classdir, tmp)

<    1   2   3   4   5   6