[jira] [Created] (SPARK-11705) Eliminate unnecessary Cartesian Join

2015-11-12 Thread Zhan Zhang (JIRA)
Zhan Zhang created SPARK-11705: -- Summary: Eliminate unnecessary Cartesian Join Key: SPARK-11705 URL: https://issues.apache.org/jira/browse/SPARK-11705 Project: Spark Issue Type: Improvement

Re: Proposal for SQL join optimization

2015-11-12 Thread Zhan Zhang
, and we can move the discussion there. Thanks. Zhan Zhang On Nov 11, 2015, at 6:16 PM, Xiao Li <gatorsm...@gmail.com<mailto:gatorsm...@gmail.com>> wrote: Hi, Zhan, That sounds really interesting! Please at me when you submit the PR. If possible, please also posted the performanc

[jira] [Commented] (HBASE-14796) Provide an alternative spark-hbase SQL implementations for Gets

2015-11-11 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001356#comment-15001356 ] Zhan Zhang commented on HBASE-14796: The number does not matter here. Given the scenario, If we

Re: Spark Thrift doesn't start

2015-11-11 Thread Zhan Zhang
In the hive-site.xml, you can remove all configuration related to tez and give it a try again. Thanks. Zhan Zhang On Nov 10, 2015, at 10:47 PM, DaeHyun Ryu <ry...@kr.ibm.com<mailto:ry...@kr.ibm.com>> wrote: Hi folks, I configured tez as execution engine of Hive. After done that

[jira] [Commented] (HBASE-14789) Provide an alternative spark-hbase connector

2015-11-11 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000943#comment-15000943 ] Zhan Zhang commented on HBASE-14789: [~ted.m] Thanks for the comments and open sub jiras for it. Do

[jira] [Commented] (HBASE-14795) Provide an alternative spark-hbase SQL implementations for Scan

2015-11-11 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001133#comment-15001133 ] Zhan Zhang commented on HBASE-14795: We can consolidate the two approaches by changing the way

[jira] [Commented] (HBASE-14796) Provide an alternative spark-hbase SQL implementations for Gets

2015-11-11 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001148#comment-15001148 ] Zhan Zhang commented on HBASE-14796: In addition, we want the full support in DataFrame level

[jira] [Commented] (HBASE-14796) Provide an alternative spark-hbase SQL implementations for Gets

2015-11-11 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001143#comment-15001143 ] Zhan Zhang commented on HBASE-14796: I agree that we don't expect the BulkGet to be huge. We should

[jira] [Commented] (HBASE-14796) Provide an alternative spark-hbase SQL implementations for Gets

2015-11-11 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001233#comment-15001233 ] Zhan Zhang commented on HBASE-14796: In theory I think performing the get in executors should have

Proposal for SQL join optimization

2015-11-11 Thread Zhan Zhang
are eliminated. Without such manual tuning, the query will never finish if a, c are big. But we should not relies on such manual optimization. Please provide your inputs. If they are both valid, I will open liras for each. Than

[jira] [Commented] (HBASE-14789) Provide an alternative spark-hbase connector

2015-11-10 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14999566#comment-14999566 ] Zhan Zhang commented on HBASE-14789: [~malaskat] Thanks for reviewing this. I agree that table write

[jira] [Commented] (HBASE-14789) Provide an alternative spark-hbase connector

2015-11-10 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14999569#comment-14999569 ] Zhan Zhang commented on HBASE-14789: [~malaskat] By the way, I didn't mean "do no

Re: Anybody hit this issue in spark shell?

2015-11-09 Thread Zhan Zhang
Thanks Ted. I am using latest master branch. I will try your build command and give it a try. Thank. Zhan Zhang On Nov 9, 2015, at 10:46 AM, Ted Yu <yuzhih...@gmail.com<mailto:yuzhih...@gmail.com>> wrote: Which branch did you perform the build with ? I used the following comma

Anybody hit this issue in spark shell?

2015-11-09 Thread Zhan Zhang
Hi Folks, Does anybody meet the following issue? I use "mvn package -Phive -DskipTests” to build the package. Thanks. Zhan Zhang bin/spark-shell ... Spark context available as sc. error: error while loading QueryExecution, Missing dependency 'bad symbolic reference. A sign

Re: Support for views/ virtual tables in SparkSQL

2015-11-09 Thread Zhan Zhang
I think you can rewrite those TPC-H queries not using view, for example registerTempTable Thanks. Zhan Zhang On Nov 9, 2015, at 9:34 PM, Sudhir Menon <sme...@pivotal.io> wrote: > Team: > > Do we plan to add support for views/ virtual tables in SparkSQL anytime soon? > Tryin

[jira] [Created] (HBASE-14789) Provide an alternative spark-hbase connector

2015-11-09 Thread Zhan Zhang (JIRA)
Zhan Zhang created HBASE-14789: -- Summary: Provide an alternative spark-hbase connector Key: HBASE-14789 URL: https://issues.apache.org/jira/browse/HBASE-14789 Project: HBase Issue Type: Bug

[jira] [Created] (HBASE-14789) Provide an alternative spark-hbase connector

2015-11-09 Thread Zhan Zhang (JIRA)
Zhan Zhang created HBASE-14789: -- Summary: Provide an alternative spark-hbase connector Key: HBASE-14789 URL: https://issues.apache.org/jira/browse/HBASE-14789 Project: HBase Issue Type: Bug

[jira] [Updated] (HBASE-14789) Provide an alternative spark-hbase connector

2015-11-09 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhan Zhang updated HBASE-14789: --- Attachment: shc.pdf Design doc attached. > Provide an alternative spark-hbase connec

[jira] [Commented] (HBASE-14789) Provide an alternative spark-hbase connector

2015-11-09 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997556#comment-14997556 ] Zhan Zhang commented on HBASE-14789: Preliminary implementation is available at https://github.com

[jira] [Commented] (SPARK-11562) Provide user an option to init SQLContext or HiveContext in spark shell

2015-11-06 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994661#comment-14994661 ] Zhan Zhang commented on SPARK-11562: Thanks [~jerrylam] report the issue and provide the suggestion

Re: [Spark-SQL]: Disable HiveContext from instantiating in spark-shell

2015-11-06 Thread Zhan Zhang
1:9083 HW11188:spark zzhang$ By the way, I don’t know whether there is any caveat for this walk around. Thanks. Zhan Zhang On Nov 6, 2015, at 2:40 PM, Jerry Lam <chiling...@gmail.com<mailto:chiling...@gmail.com>> wrote: Hi Zhan, I don’t use HiveContext features at

Re: [Spark-SQL]: Disable HiveContext from instantiating in spark-shell

2015-11-06 Thread Zhan Zhang
I agree with minor change. Adding a config to provide the option to init SQLContext or HiveContext, with HiveContext as default instead of bypassing when hitting the Exception. Thanks. Zhan Zhang On Nov 6, 2015, at 2:53 PM, Ted Yu <yuzhih...@gmail.com<mailto:yuzhih...@gmail.com>&g

Re: [Spark-SQL]: Disable HiveContext from instantiating in spark-shell

2015-11-06 Thread Zhan Zhang
If you assembly jar have hive jar included, the HiveContext will be used. Typically, HiveContext has more functionality than SQLContext. In what case you have to use SQLContext that cannot be done by HiveContext? Thanks. Zhan Zhang On Nov 6, 2015, at 10:43 AM, Jerry Lam <chiling...@gmail.

Re: [Spark-SQL]: Disable HiveContext from instantiating in spark-shell

2015-11-06 Thread Zhan Zhang
Hi Jerry, https://issues.apache.org/jira/browse/SPARK-11562 is created for the issue. Thanks. Zhan Zhang On Nov 6, 2015, at 3:01 PM, Jerry Lam <chiling...@gmail.com<mailto:chiling...@gmail.com>> wrote: Hi Zhan, Thank you for providing a workaround! I will try this out but I ag

[jira] [Created] (SPARK-11562) Provide user an option to init SQLContext or HiveContext in spark shell

2015-11-06 Thread Zhan Zhang (JIRA)
Zhan Zhang created SPARK-11562: -- Summary: Provide user an option to init SQLContext or HiveContext in spark shell Key: SPARK-11562 URL: https://issues.apache.org/jira/browse/SPARK-11562 Project: Spark

Re: Vague Spark SQL error message with saveAsParquetFile

2015-11-03 Thread Zhan Zhang
Looks like some JVM got killed or OOM. You can check the log to see the real causes. Thanks. Zhan Zhang On Nov 3, 2015, at 9:23 AM, YaoPau <jonrgr...@gmail.com<mailto:jonrgr...@gmail.com>> wrote: java.io.FileNotFoun

Re: Upgrade spark cluster to latest version

2015-11-03 Thread Zhan Zhang
Spark is a client library. You can just download the latest release or build on you own, and replace your existing one without changing you existing cluster. Thanks. Zhan Zhang On Nov 3, 2015, at 3:58 PM, roni <roni.epi...@gmail.com<mailto:roni.epi...@gmail.com>> wrote: Hi S

[jira] [Commented] (SPARK-11087) spark.sql.orc.filterPushdown does not work, No ORC pushdown predicate

2015-10-21 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967456#comment-14967456 ] Zhan Zhang commented on SPARK-11087: [~patcharee] I tried again, used the step you provided

[jira] [Commented] (SPARK-11087) spark.sql.orc.filterPushdown does not work, No ORC pushdown predicate

2015-10-21 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1496#comment-1496 ] Zhan Zhang commented on SPARK-11087: [~patcharee] I use the embeded hive metastore without any

[jira] [Comment Edited] (SPARK-11087) spark.sql.orc.filterPushdown does not work, No ORC pushdown predicate

2015-10-15 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959599#comment-14959599 ] Zhan Zhang edited comment on SPARK-11087 at 10/15/15 8:58 PM: -- [~patcharee

[jira] [Commented] (SPARK-11087) spark.sql.orc.filterPushdown does not work, No ORC pushdown predicate

2015-10-15 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959599#comment-14959599 ] Zhan Zhang commented on SPARK-11087: [~patcharee] I try to duplicate your table as much as possible

[jira] [Commented] (SPARK-11087) spark.sql.orc.filterPushdown does not work, No ORC pushdown predicate

2015-10-15 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959342#comment-14959342 ] Zhan Zhang commented on SPARK-11087: [~patcharee] I tried a simple case with partition and predicate

[jira] [Commented] (SPARK-11087) spark.sql.orc.filterPushdown does not work, No ORC pushdown predicate

2015-10-14 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14957301#comment-14957301 ] Zhan Zhang commented on SPARK-11087: I will take a look at this one. > spark.sql.orc.filterPushd

Re: sql query orc slow

2015-10-13 Thread Zhan Zhang
the JIRA number? Thanks. Zhan Zhang On Oct 13, 2015, at 1:01 AM, Patcharee Thongtra <patcharee.thong...@uni.no<mailto:patcharee.thong...@uni.no>> wrote: Hi Zhan Zhang, Is my problem (which is ORC predicate is not generated from WHERE clause even though spark.sql.orc.filterPushdo

[jira] [Commented] (SPARK-11087) spark.sql.orc.filterPushdown does not work, No ORC pushdown predicate

2015-10-13 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1499#comment-1499 ] Zhan Zhang commented on SPARK-11087: no matter whether the table is sorted or not, the predicate

[jira] [Commented] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior

2015-10-12 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14953450#comment-14953450 ] Zhan Zhang commented on HBASE-14406: [~malaskat] Can you add a test case as val results

Re: sql query orc slow

2015-10-09 Thread Zhan Zhang
versions of OrcInputFormat. The hive path may use NewOrcInputFormat, but the spark path use OrcInputFormat. Thanks. Zhan Zhang On Oct 8, 2015, at 11:55 PM, patcharee <patcharee.thong...@uni.no> wrote: > Yes, the predicate pushdown is enabled, but still take longer time than the >

Re: sql query orc slow

2015-10-09 Thread Zhan Zhang
In your case, you manually set an AND pushdown, and the predicate is right based on your setting, : leaf-0 = (EQUALS x 320) The right way is to enable the predicate pushdown as follows. sqlContext.setConf("spark.sql.orc.filterPushdown", "true”) Thanks. Zhan Zhang On Oct 9

Re: sql query orc slow

2015-10-09 Thread Zhan Zhang
That is weird. Unfortunately, there is no debug info available on this part. Can you please open a JIRA to add some debug information on the driver side? Thanks. Zhan Zhang On Oct 9, 2015, at 10:22 AM, patcharee <patcharee.thong...@uni.no<mailto:patcharee.thong...@uni.no>> w

Re: sql query orc slow

2015-10-08 Thread Zhan Zhang
Hi Patcharee, Did you enable the predicate pushdown in the second method? Thanks. Zhan Zhang On Oct 8, 2015, at 1:43 AM, patcharee <patcharee.thong...@uni.no> wrote: > Hi, > > I am using spark sql 1.5 to query a hive table stored as partitioned orc > file. We have the to

Re: how to submit the spark job outside the cluster

2015-09-22 Thread Zhan Zhang
It should be similar to other hadoop jobs. You need hadoop configuration in your client machine, and point the HADOOP_CONF_DIR in spark to the configuration. Thanks Zhan Zhang On Sep 22, 2015, at 6:37 PM, Zhiliang Zhu <zchl.j...@yahoo.com.INVALID<mailto:zchl.j...@yahoo.com.INVALID&g

Re: how to submit the spark job outside the cluster

2015-09-22 Thread Zhan Zhang
, the former is used to access hdfs, and the latter is used to launch application on top of yarn. Then in the spark-env.sh, you add export HADOOP_CONF_DIR=/etc/hadoop/conf. Thanks. Zhan Zhang On Sep 22, 2015, at 8:14 PM, Zhiliang Zhu <zchl.j...@yahoo.com<mailto:zchl.j...@yahoo.com>> wro

Re: how to submit the spark job outside the cluster

2015-09-22 Thread Zhan Zhang
. Zhan Zhang On Sep 22, 2015, at 7:49 PM, Zhiliang Zhu <zchl.j...@yahoo.com<mailto:zchl.j...@yahoo.com>> wrote: Hi Zhan, Thanks very much for your help comment. I also view it would be similar to hadoop job submit, however, I was not deciding whether it is like that when it comes to spar

Re: HDP 2.3 support for Spark 1.5.x

2015-09-22 Thread Zhan Zhang
Hi Krishna, For the time being, you can download from upstream, and it should be running OK for HDP2.3. For hdp specific problem, you can ask in Hortonworks forum. Thanks. Zhan Zhang On Sep 22, 2015, at 3:42 PM, Krishna Sankar <ksanka...@gmail.com<mailto:ksanka...@gmail.com>>

Re: PrunedFilteredScan does not work for UDTs and Struct fields

2015-09-19 Thread Zhan Zhang
It looks complicated, but I think it would work. Thanks. Zhan Zhang From: Richard Eggert <richard.egg...@gmail.com> Sent: Saturday, September 19, 2015 3:59 PM To: User Subject: PrunedFilteredScan does not work for UDTs and Struct fields I defined my own rela

Re: spark-shell 1.5 doesn't seem to work in local mode

2015-09-19 Thread Zhan Zhang
It does not matter whether you start your spark with local or other mode. If you have hdfs-site.xml somewhere and spark configuration pointing to that config, you will read/write to HDFS. Thanks. Zhan Zhang From: Madhu <ma...@madhu.com> Sent: Sa

[jira] [Commented] (SPARK-10623) turning on predicate pushdown throws nonsuch element exception when RDD is empty

2015-09-15 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14746164#comment-14746164 ] Zhan Zhang commented on SPARK-10623: It is caused by the SearchArgument.Builder is not correctly

[jira] [Commented] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior

2015-09-10 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740194#comment-14740194 ] Zhan Zhang commented on HBASE-14406: Following condition will result in the same filter. It will have

[jira] [Updated] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior

2015-09-10 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhan Zhang updated HBASE-14406: --- Assignee: (was: Zhan Zhang) > The dataframe datasource filter is wrong, and will result in d

[jira] [Created] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior

2015-09-10 Thread Zhan Zhang (JIRA)
Zhan Zhang created HBASE-14406: -- Summary: The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior Key: HBASE-14406 URL: https://issues.apache.org/jira/browse/HBASE-14406

[jira] [Created] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior

2015-09-10 Thread Zhan Zhang (JIRA)
Zhan Zhang created HBASE-14406: -- Summary: The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior Key: HBASE-14406 URL: https://issues.apache.org/jira/browse/HBASE-14406

[jira] [Commented] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior

2015-09-10 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740197#comment-14740197 ] Zhan Zhang commented on HBASE-14406: Ted Malaska [~malaskat]'s response On one you 100% right. We're

[jira] [Commented] (SPARK-10304) Partition discovery does not throw an exception if the dir structure is invalid

2015-09-08 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735710#comment-14735710 ] Zhan Zhang commented on SPARK-10304: Did more investigation. Currently all files are included

[jira] [Commented] (SPARK-10304) Partition discovery does not throw an exception if the dir structure is valid

2015-08-31 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723851#comment-14723851 ] Zhan Zhang commented on SPARK-10304: [~yhuai] I tried to reproduce the problem with the same

[jira] [Commented] (SPARK-10304) Partition discovery does not throw an exception if the dir structure is valid

2015-08-31 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723994#comment-14723994 ] Zhan Zhang commented on SPARK-10304: [~yhuai] I think the NPE is caused by the directory has multiple

[jira] [Commented] (SPARK-10304) Partition discovery does not throw an exception if the dir structure is valid

2015-08-31 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724122#comment-14724122 ] Zhan Zhang commented on SPARK-10304: [~lian cheng] forget about my question. From the code

[jira] [Commented] (SPARK-10304) Partition discovery does not throw an exception if the dir structure is valid

2015-08-27 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717951#comment-14717951 ] Zhan Zhang commented on SPARK-10304: [~yhuai] Thanks for the information, and initial

[jira] [Commented] (SPARK-10304) Need to add a null check in unwrapperFor in HiveInspectors

2015-08-27 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716204#comment-14716204 ] Zhan Zhang commented on SPARK-10304: [~yhuai] Is the field.getFieldObjectInspector

[jira] [Commented] (SPARK-10304) Need to add a null check in unwrapperFor in HiveInspectors

2015-08-26 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715958#comment-14715958 ] Zhan Zhang commented on SPARK-10304: [~yhuai] NP. Will look at it. Need to add

Re: Error when saving a dataframe as ORC file

2015-08-23 Thread Zhan Zhang
If you are using spark-1.4.0, probably it is caused by SPARK-8458https://issues.apache.org/jira/browse/SPARK-8458 Thanks. Zhan Zhang On Aug 23, 2015, at 12:49 PM, lostrain A donotlikeworkingh...@gmail.commailto:donotlikeworkingh...@gmail.com wrote: Ted, Thanks for the suggestions. Actually

[jira] [Commented] (SPARK-5111) HiveContext and Thriftserver cannot work in secure cluster beyond hadoop2.5

2015-08-11 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14682189#comment-14682189 ] Zhan Zhang commented on SPARK-5111: --- [~adkathu...@yahoo.com] Hive is upgrade to 1.2

[jira] [Commented] (SPARK-5111) HiveContext and Thriftserver cannot work in secure cluster beyond hadoop2.5

2015-08-07 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14662227#comment-14662227 ] Zhan Zhang commented on SPARK-5111: --- Since hive upgrade is done. This jira is not valid

Re: Authentication Support with spark-submit cluster mode

2015-07-29 Thread Zhan Zhang
If you run it on yarn with kerberos setup. You authenticate yourself by kinit before launching the job. Thanks. Zhan Zhang On Jul 28, 2015, at 8:51 PM, Anh Hong hongnhat...@yahoo.com.INVALIDmailto:hongnhat...@yahoo.com.INVALID wrote: Hi, I'd like to remotely run spark-submit from a local

Re: Make off-heap store pluggable

2015-07-21 Thread Zhan Zhang
Hi Alexey, SPARK-6479https://issues.apache.org/jira/browse/SPARK-6479 is for the plugin API, and SPARK-6112https://issues.apache.org/jira/browse/SPARK-6112 is for hdfs plugin. Thanks. Zhan Zhang On Jul 21, 2015, at 10:56 AM, Alexey Goncharuk alexey.goncha...@gmail.commailto:alexey.goncha

[jira] [Commented] (SPARK-8501) ORC data source may give empty schema if an ORC file containing zero rows is picked for schema discovery

2015-07-02 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612643#comment-14612643 ] Zhan Zhang commented on SPARK-8501: --- Because in spark, we will not create the orc file

[jira] [Commented] (SPARK-2883) Spark Support for ORCFile format

2015-06-26 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603166#comment-14603166 ] Zhan Zhang commented on SPARK-2883: --- [~philclaridge] Please refer to the test case

[jira] [Commented] (SPARK-2883) Spark Support for ORCFile format

2015-06-26 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603164#comment-14603164 ] Zhan Zhang commented on SPARK-2883: --- [~biao luo] saveAsOrcFile and orcFile

[jira] [Commented] (SPARK-2883) Spark Support for ORCFile format

2015-06-26 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603669#comment-14603669 ] Zhan Zhang commented on SPARK-2883: --- [~philclaridge] I try the spark-shell in local

[jira] [Commented] (SPARK-5111) HiveContext and Thriftserver cannot work in secure cluster beyond hadoop2.5

2015-06-22 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596244#comment-14596244 ] Zhan Zhang commented on SPARK-5111: --- [~bolke] Thanks for the feedback. I will take

[jira] [Commented] (SPARK-6112) Provide external block store support through HDFS RAM_DISK

2015-06-22 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596277#comment-14596277 ] Zhan Zhang commented on SPARK-6112: --- [~bghit] Here is one example link for the ramdisk

[jira] [Commented] (SPARK-6112) Provide external block store support through HDFS RAM_DISK

2015-06-20 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14594935#comment-14594935 ] Zhan Zhang commented on SPARK-6112: --- Thansk [~arpitagarwal] for the detail setup

[jira] [Commented] (SPARK-7009) Build assembly JAR via ant to avoid zip64 problems

2015-06-17 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590680#comment-14590680 ] Zhan Zhang commented on SPARK-7009: --- [~airhorns] Please refer to https

[jira] [Commented] (SPARK-7009) Build assembly JAR via ant to avoid zip64 problems

2015-06-16 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589108#comment-14589108 ] Zhan Zhang commented on SPARK-7009: --- The PR may be outdated, and not working against

[jira] [Updated] (SPARK-6112) Provide external block store support through HDFS RAM_DISK

2015-05-27 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhan Zhang updated SPARK-6112: -- Summary: Provide external block store support through HDFS RAM_DISK (was: Provide OffHeap support

Re: [SPAM] Customized Aggregation Query on Spark SQL

2015-04-30 Thread Zhan Zhang
One optimization is to reduce the shuffle by first aggregate locally (only keep the max for each name), and then reduceByKey. Thanks. Zhan Zhang On Apr 24, 2015, at 10:03 PM, ayan guha guha.a...@gmail.commailto:guha.a...@gmail.com wrote: Here you go t = [[A,10,A10],[A,20,A20],[A,30,A30

Re: HDP 2.2 AM abort : Unable to find ExecutorLauncher class

2015-04-17 Thread Zhan Zhang
Hi Udit, By the way, do you mind to share the whole log trace? Thanks. Zhan Zhang On Apr 17, 2015, at 2:26 PM, Udit Mehta ume...@groupon.commailto:ume...@groupon.com wrote: I am just trying to launch a spark shell and not do anything fancy. I got the binary distribution from apache and put

Re: HDP 2.2 AM abort : Unable to find ExecutorLauncher class

2015-04-17 Thread Zhan Zhang
: For spark-1.3, you can use the binary distribution from apache. Thanks. Zhan Zhang On Apr 17, 2015, at 2:01 PM, Udit Mehta ume...@groupon.commailto:ume...@groupon.com wrote: I followed the steps described above and I still get this error: Error: Could not find or load main class

Re: HDP 2.2 AM abort : Unable to find ExecutorLauncher class

2015-04-17 Thread Zhan Zhang
You probably want to first try the basic configuration to see whether it works, instead of setting SPARK_JAR pointing to the hdfs location. This error is caused by not finding ExecutorLauncher in class path, and not HDP specific, I think. Thanks. Zhan Zhang On Apr 17, 2015, at 2:26 PM, Udit

Re: HDP 2.2 AM abort : Unable to find ExecutorLauncher class

2015-04-17 Thread Zhan Zhang
[root@c6402 conf]# Thanks. Zhan Zhang On Apr 17, 2015, at 3:09 PM, Udit Mehta ume...@groupon.commailto:ume...@groupon.com wrote: Hi, This is the log trace: https://gist.github.com/uditmehta27/511eac0b76e6d61f8b47 On the yarn RM UI, I see : Error: Could not find or load main class

[jira] [Commented] (SPARK-5111) HiveContext and Thriftserver cannot work in secure cluster beyond hadoop2.5

2015-04-13 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14493483#comment-14493483 ] Zhan Zhang commented on SPARK-5111: --- [~crystal_gaoyu] I am not sure. You may try

Re: Spark 1.3.0: Running Pi example on YARN fails

2015-04-13 Thread Zhan Zhang
–2041 spark.yarn.am.extraJavaOptions -Dhdp.version=2.2.0.0–2041 This is HDP specific question, and you can move the topic to HDP forum. Thanks. Zhan Zhang On Apr 13, 2015, at 3:00 AM, Zork Sail zorks...@gmail.commailto:zorks...@gmail.com wrote: Hi Zhan, Alas setting: -Dhdp.version=2.2.0.0

[jira] [Commented] (SPARK-6479) Create off-heap block storage API (internal)

2015-04-08 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14486143#comment-14486143 ] Zhan Zhang commented on SPARK-6479: --- [~rxin] I updated the doc. If you think the overall

[jira] [Updated] (SPARK-6479) Create off-heap block storage API (internal)

2015-04-08 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhan Zhang updated SPARK-6479: -- Attachment: spark-6479-tachyon.patch patch with Tachyon migration. Not complete patch, as it will add

[jira] [Updated] (SPARK-6479) Create off-heap block storage API (internal)

2015-04-08 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhan Zhang updated SPARK-6479: -- Attachment: SPARK-6479OffheapAPIdesign (1).pdf Add exception from implementation Create off-heap

[jira] [Comment Edited] (SPARK-2883) Spark Support for ORCFile format

2015-04-02 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393141#comment-14393141 ] Zhan Zhang edited comment on SPARK-2883 at 4/2/15 7:54 PM

[jira] [Comment Edited] (SPARK-6479) Create off-heap block storage API (internal)

2015-04-02 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393590#comment-14393590 ] Zhan Zhang edited comment on SPARK-6479 at 4/2/15 10:24 PM

[jira] [Comment Edited] (SPARK-2883) Spark Support for ORCFile format

2015-04-02 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393141#comment-14393141 ] Zhan Zhang edited comment on SPARK-2883 at 4/2/15 7:54 PM

[jira] [Updated] (SPARK-6479) Create off-heap block storage API (internal)

2015-04-02 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhan Zhang updated SPARK-6479: -- Attachment: SPARK-6479OffheapAPIdesign.pdf Add failure case handling overall design and example

[jira] [Commented] (SPARK-6112) Provide OffHeap support through HDFS RAM_DISK

2015-04-02 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393287#comment-14393287 ] Zhan Zhang commented on SPARK-6112: --- Design spec for API attached to SPARK-6479 and wait

[jira] [Updated] (SPARK-6479) Create off-heap block storage API (internal)

2015-04-02 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhan Zhang updated SPARK-6479: -- Attachment: SPARK-6479.pdf This is the updated version for offheap store internal api design. Create

[jira] [Comment Edited] (SPARK-6479) Create off-heap block storage API (internal)

2015-04-02 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393590#comment-14393590 ] Zhan Zhang edited comment on SPARK-6479 at 4/2/15 10:23 PM

[jira] [Commented] (SPARK-6479) Create off-heap block storage API (internal)

2015-04-02 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393590#comment-14393590 ] Zhan Zhang commented on SPARK-6479: --- [~rxin] Thanks for the feedback. I updated

[jira] [Commented] (SPARK-2883) Spark Support for ORCFile format

2015-04-02 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393141#comment-14393141 ] Zhan Zhang commented on SPARK-2883: --- Following code demonstrate the usage of the orc

[jira] [Commented] (SPARK-3720) support ORC in spark sql

2015-04-02 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393146#comment-14393146 ] Zhan Zhang commented on SPARK-3720: --- [~iward] I have update the patch with new api

Re: HDP 2.2 AM abort : Unable to find ExecutorLauncher class

2015-03-30 Thread Zhan Zhang
/sp[ark-defaults.conf, adding following settings. spark.driver.extraJavaOptions -Dhdp.version=x spark.yarn.am.extraJavaOptions -Dhdp.version=x 3. In $SPARK_HOME/java-opts, add following options. -Dhdp.version=x Thanks. Zhan Zhang On Mar 30, 2015, at 6:56 AM, Doug Balog

[jira] [Commented] (SPARK-6479) Create off-heap block storage API (internal)

2015-03-27 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14384872#comment-14384872 ] Zhan Zhang commented on SPARK-6479: --- I have a short version for this API and will post

Re: 2 input paths generate 3 partitions

2015-03-27 Thread Zhan Zhang
Hi Rares, The number of partition is controlled by HDFS input format, and one file may have multiple partitions if it consists of multiple block. In you case, I think there is one file with 2 splits. Thanks. Zhan Zhang On Mar 27, 2015, at 3:12 PM, Rares Vernica rvern...@gmail.commailto:rvern

Re: Can't access file in spark, but can in hadoop

2015-03-27 Thread Zhan Zhang
Probably guava version conflicts issue. What spark version did you use, and which hadoop version it compile against? Thanks. Zhan Zhang On Mar 27, 2015, at 12:13 PM, Johnson, Dale daljohn...@ebay.commailto:daljohn...@ebay.com wrote: Yes, I could recompile the hdfs client with more logging

RDD.map does not allowed to preservesPartitioning?

2015-03-26 Thread Zhan Zhang
[] | ShuffledRDD[2] at reduceByKey at console:25 [] +-(8) MapPartitionsRDD[1] at map at console:23 [] | ParallelCollectionRDD[0] at parallelize at console:21 [] Thanks. Zhan Zhang - To unsubscribe, e-mail: user

Re: RDD.map does not allowed to preservesPartitioning?

2015-03-26 Thread Zhan Zhang
with keeping key part untouched. Then mapValues may not be able to do this. Changing the code to allow this is trivial, but I don’t know whether there is some special reason behind this. Thanks. Zhan Zhang On Mar 26, 2015, at 2:49 PM, Jonathan Coveney jcove...@gmail.commailto:jcove

<    1   2   3   4   5   6   >