[jira] [Updated] (HBASE-14801) Enhance the Spark-HBase connector catalog with json format

2016-02-24 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhan Zhang updated HBASE-14801: --- Status: Patch Available (was: In Progress) > Enhance the Spark-HBase connector catalog with j

[jira] [Updated] (HBASE-14801) Enhance the Spark-HBase connector catalog with json format

2016-02-24 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhan Zhang updated HBASE-14801: --- Status: In Progress (was: Patch Available) > Enhance the Spark-HBase connector catalog with j

[jira] [Updated] (HBASE-14801) Enhance the Spark-HBase connector catalog with json format

2016-02-24 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhan Zhang updated HBASE-14801: --- Attachment: HBASE-14801-2.patch > Enhance the Spark-HBase connector catalog with json for

[jira] [Updated] (HBASE-14801) Enhance the Spark-HBase connector catalog with json format

2016-02-24 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhan Zhang updated HBASE-14801: --- Attachment: (was: HBASE-14801-2.patch) > Enhance the Spark-HBase connector catalog with j

Re: ORC file writing hangs in pyspark

2016-02-23 Thread Zhan Zhang
Hi James, You can try to write with other format, e.g., parquet to see whether it is a orc specific issue or more generic issue. Thanks. Zhan Zhang On Feb 23, 2016, at 6:05 AM, James Barney <jamesbarne...@gmail.com<mailto:jamesbarne...@gmail.com>> wrote: I'm trying to write

[jira] [Commented] (HBASE-14801) Enhance the Spark-HBase connector catalog with json format

2016-02-23 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159806#comment-15159806 ] Zhan Zhang commented on HBASE-14801: Will update the scoreboard after the sanity test by server

[jira] [Updated] (HBASE-14801) Enhance the Spark-HBase connector catalog with json format

2016-02-23 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhan Zhang updated HBASE-14801: --- Attachment: HBASE-14801-2.patch > Enhance the Spark-HBase connector catalog with json for

[jira] [Commented] (SPARK-7009) Build assembly JAR via ant to avoid zip64 problems

2016-01-30 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15125205#comment-15125205 ] Zhan Zhang commented on SPARK-7009: --- Yes. This one is obsoleted. > Build assembly JAR via ant to av

[jira] [Commented] (HBASE-14801) Enhance the Spark-HBase connector catalog with json format

2016-01-26 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15118266#comment-15118266 ] Zhan Zhang commented on HBASE-14801: Looks like most of warning does not apply to this patch. I

[jira] [Commented] (SPARK-11075) Spark SQL Thrift Server authentication issue on kerberized yarn cluster

2016-01-22 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113560#comment-15113560 ] Zhan Zhang commented on SPARK-11075: Duplicated to SPARK-5159? > Spark SQL Thrift Ser

[jira] [Updated] (HBASE-14801) Enhance the Spark-HBase connector catalog with json format

2016-01-20 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhan Zhang updated HBASE-14801: --- Attachment: HBASE-14801-1.patch > Enhance the Spark-HBase connector catalog with json for

[jira] [Updated] (HBASE-14801) Enhance the Spark-HBase connector catalog with json format

2016-01-20 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhan Zhang updated HBASE-14801: --- Status: Patch Available (was: Open) > Enhance the Spark-HBase connector catalog with json for

[jira] [Commented] (SPARK-5159) Thrift server does not respect hive.server2.enable.doAs=true

2016-01-15 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15102183#comment-15102183 ] Zhan Zhang commented on SPARK-5159: --- What happen if an user have a valid visit to a table, which

[jira] [Comment Edited] (SPARK-5159) Thrift server does not respect hive.server2.enable.doAs=true

2016-01-15 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15102183#comment-15102183 ] Zhan Zhang edited comment on SPARK-5159 at 1/15/16 5:50 PM: What happen

[jira] [Commented] (SPARK-5159) Thrift server does not respect hive.server2.enable.doAs=true

2016-01-14 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098734#comment-15098734 ] Zhan Zhang commented on SPARK-5159: --- This issue is definitely broken. But fixing it needs a complete

Review Request 42118: AMBARI-14601 Disable impersonation in spark hive support

2016-01-10 Thread Zhan Zhang
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319) Thanks, Zhan Zhang

[jira] [Created] (AMBARI-14601) Disable impersonation in spark

2016-01-10 Thread Zhan Zhang (JIRA)
Zhan Zhang created AMBARI-14601: --- Summary: Disable impersonation in spark Key: AMBARI-14601 URL: https://issues.apache.org/jira/browse/AMBARI-14601 Project: Ambari Issue Type: Bug

[jira] [Commented] (AMBARI-14601) Disable impersonation in spark

2016-01-10 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/AMBARI-14601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15091161#comment-15091161 ] Zhan Zhang commented on AMBARI-14601: - Currently spark thriftserver cannot do impersonation

[jira] [Updated] (AMBARI-14601) Disable impersonation in spark

2016-01-10 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/AMBARI-14601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhan Zhang updated AMBARI-14601: Attachment: AMBARI-14601.patch set hive.server2.enable.doAs to false > Disable impersonat

Dr.appointment this afternoon and WFH tomorrow for another Dr. appointment (EOM)

2016-01-07 Thread Zhan Zhang
- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org

[jira] [Commented] (HBASE-14801) Enhance the Spark-HBase connector catalog with json format

2016-01-06 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15086414#comment-15086414 ] Zhan Zhang commented on HBASE-14801: I will start to working on this. Please let me know if anyone

[jira] [Updated] (HBASE-14796) Enhance the Gets in the connector

2015-12-28 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhan Zhang updated HBASE-14796: --- Attachment: HBASE-14796-1.patch solve review comments > Enhance the Gets in the connec

Re: Problem using limit clause in spark sql

2015-12-23 Thread Zhan Zhang
to be materialized in each partition, because some partition may not have enough number of records, sometimes it is even empty. I didn’t see any straightforward walk around for this. Thanks. Zhan Zhang On Dec 23, 2015, at 5:32 PM, 汪洋 <tiandiwo...@icloud.com<mailto:tiandiwo...@icloud.com&g

Re: Unable to create hive table using HiveContext

2015-12-23 Thread Zhan Zhang
You are using embedded mode, which will create the db locally (in your case, maybe the db has been created, but you do not have right permission?). To connect to remote metastore, hive-site.xml has to be correctly configured. Thanks. Zhan Zhang On Dec 23, 2015, at 7:24 AM, Soni spark

[jira] [Updated] (HBASE-14796) Enhance the Gets in the connector

2015-12-23 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhan Zhang updated HBASE-14796: --- Attachment: HBASE-14976.patch We have use case where bulkget may consists of thousands of gets. Move

[jira] [Updated] (HBASE-14796) Enhance the Gets in the connector

2015-12-23 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhan Zhang updated HBASE-14796: --- Release Note: spark.hbase.bulkGetSize in HBaseSparkConf is for grouping bulkGet, and default value

Re: DataFrameWriter.format(String) is there a list of options?

2015-12-23 Thread Zhan Zhang
Now json, parquet, orc(in hivecontext), text are natively supported. If you use avro or others, you have to include the package, which are not built into spark jar. Thanks. Zhan Zhang On Dec 23, 2015, at 8:57 AM, Christopher Brady <christopher.br...@oracle.com<mailto:christop

[jira] [Commented] (HBASE-14796) Enhance the Gets in the connector

2015-12-23 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070286#comment-15070286 ] Zhan Zhang commented on HBASE-14796: Thanks [~ted.m] for the quick review. It is reasonable to have

Re: Can SqlContext be used inside mapPartitions

2015-12-22 Thread Zhan Zhang
SQLContext is in driver side, and I don’t think you can use it in executors. How to provide lookup functionality in executors really depends on how you would use them. Thanks. Zhan Zhang On Dec 22, 2015, at 4:44 PM, SRK <swethakasire...@gmail.com> wrote: > Hi, > > Can SQL

Re: number limit of map for spark

2015-12-21 Thread Zhan Zhang
In what situation, you have such cases? If there is no shuffle, you can collapse all these functions into one, right? In the meantime, it is not recommended to collect all data to driver. Thanks. Zhan Zhang On Dec 21, 2015, at 3:44 AM, Zhiliang Zhu <zchl.j...@yahoo.com.INVALID<mailto:

Re: number limit of map for spark

2015-12-21 Thread Zhan Zhang
application. Thanks. Zhan Zhang On Dec 21, 2015, at 10:43 AM, Zhiliang Zhu <zchl.j...@yahoo.com.INVALID<mailto:zchl.j...@yahoo.com.INVALID>> wrote: What is difference between repartition / collect and collapse ... Is collapse the same costly as collect or repartition ? Thank

Re: [Spark SQL] SQLContext getOrCreate incorrect behaviour

2015-12-21 Thread Zhan Zhang
. Thanks. Zhan Zhang Note that when sc is stopped, all resources are released (for example in yarn On Dec 20, 2015, at 2:59 PM, Jerry Lam <chiling...@gmail.com> wrote: > Hi Spark developers, > > I found that SQLContext.getOrCreate(sc: SparkContext) does not behave > correctly when

Re: Spark with log4j

2015-12-21 Thread Zhan Zhang
it, at application run time, you can log into the container’s box, and check the local cache of the container to find whether the log file exists or not (after app terminate, these local cache files will be deleted as well). Thanks. Zhan Zhang On Dec 18, 2015, at 7:23 AM, Kalpesh Jadhav <kalpesh.

Re: spark-submit is ignoring "--executor-cores"

2015-12-21 Thread Zhan Zhang
BTW: It is not only a Yarn-webui issue. In capacity scheduler, vcore is ignored. If you want Yarn to honor vcore requests, you have to use DominantResourceCalculator as Saisai suggested. Thanks. Zhan Zhang On Dec 21, 2015, at 5:30 PM, Saisai Shao <sai.sai.s...@gmail.com<mailto:sai

[jira] [Updated] (HBASE-14849) Add option to set block cache to false on SparkSQL executions

2015-12-18 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhan Zhang updated HBASE-14849: --- Attachment: HBASE-14849-2.patch Solve review comments. > Add option to set block cache to fa

[jira] [Updated] (HBASE-14849) Add option to set block cache to false on SparkSQL executions

2015-12-18 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhan Zhang updated HBASE-14849: --- Release Note: For user configurable parameters for HBase datasources. Please refer

[jira] [Commented] (HBASE-14849) Add option to set block cache to false on SparkSQL executions

2015-12-18 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15064656#comment-15064656 ] Zhan Zhang commented on HBASE-14849: [~ted_yu] Not very familiar with it. Could you clarify which doc

[jira] [Updated] (HBASE-14849) Add option to set block cache to false on SparkSQL executions

2015-12-17 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhan Zhang updated HBASE-14849: --- Attachment: HBASE-14849-1.patch fix style check. The javadoc warning is not related to this jira

[jira] [Updated] (HBASE-14991) Fix the feature warning in scala code

2015-12-17 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhan Zhang updated HBASE-14991: --- Attachment: HBASE-14991-1.patch Attach the same file to kick off the testing > Fix the feat

Re: About Spark On Hbase

2015-12-15 Thread Zhan Zhang
If you want dataframe support, you can refer to https://github.com/zhzhan/shc, which I am working on to integrate to HBase upstream with existing support. Thanks. Zhan Zhang On Dec 15, 2015, at 4:34 AM, censj <ce...@lotuseed.com<mailto:ce...@lotuseed.com>> wrote: hi,fight fa

Re: Spark big rdd problem

2015-12-15 Thread Zhan Zhang
You should be able to get the logs from yarn by “yarn logs -applicationId xxx”, where you can possible find the cause. Thanks. Zhan Zhang On Dec 15, 2015, at 11:50 AM, Eran Witkon <eranwit...@gmail.com> wrote: > When running > val data = sc.wholeTextFile("someDir/*") d

[jira] [Commented] (HBASE-14795) Enhance the spark-hbase scan operations

2015-12-15 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058997#comment-15058997 ] Zhan Zhang commented on HBASE-14795: [~jmhsieh] Thanks for bring this up. I am working on HBASE-14849

[jira] [Updated] (YARN-4445) Unify the term flowId and flowName in timeline v2 codebase

2015-12-15 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhan Zhang updated YARN-4445: - Attachment: YARN-4445-feature-YARN-2928.001.patch > Unify the term flowId and flowName in timeline

[jira] [Updated] (HBASE-14849) Add option to set block cache to false on SparkSQL executions

2015-12-15 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhan Zhang updated HBASE-14849: --- Status: Open (was: Patch Available) > Add option to set block cache to false on SparkSQL executi

[jira] [Updated] (HBASE-14849) Add option to set block cache to false on SparkSQL executions

2015-12-15 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhan Zhang updated HBASE-14849: --- Attachment: HBASE-14849.patch > Add option to set block cache to false on SparkSQL executi

[jira] [Updated] (HBASE-14849) Add option to set block cache to false on SparkSQL executions

2015-12-15 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhan Zhang updated HBASE-14849: --- Status: Patch Available (was: Open) > Add option to set block cache to false on SparkSQL executi

[jira] [Updated] (YARN-4445) Unify the term flowId and flowName in timeline v2 codebase

2015-12-15 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhan Zhang updated YARN-4445: - Attachment: YARN-4445.patch > Unify the term flowId and flowName in timeline v2 codeb

[jira] [Updated] (HBASE-14849) Add option to set block cache to false on SparkSQL executions

2015-12-15 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhan Zhang updated HBASE-14849: --- Attachment: HBASE-14849.patch Migrate hbase configuration to SparkConf, and some cleanup. >

[jira] [Updated] (HBASE-14849) Add option to set block cache to false on SparkSQL executions

2015-12-15 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhan Zhang updated HBASE-14849: --- Status: Patch Available (was: Open) > Add option to set block cache to false on SparkSQL executi

[jira] [Commented] (YARN-4445) Unify the term flowId and flowName in timeline v2 codebase

2015-12-15 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059212#comment-15059212 ] Zhan Zhang commented on YARN-4445: -- rename > Unify the term flowId and flowName in timeline v2 codeb

[jira] [Updated] (HBASE-14849) Add option to set block cache to false on SparkSQL executions

2015-12-15 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhan Zhang updated HBASE-14849: --- Attachment: (was: HBASE-14849.patch) > Add option to set block cache to false on Spark

[jira] [Commented] (HBASE-14795) Enhance the spark-hbase scan operations

2015-12-15 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059509#comment-15059509 ] Zhan Zhang commented on HBASE-14795: [~jmhsieh] I am trying to figure out how to "r

[jira] [Commented] (HBASE-14795) Enhance the spark-hbase scan operations

2015-12-15 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059498#comment-15059498 ] Zhan Zhang commented on HBASE-14795: [~jmhsieh] My patch in HBASE-14849 will not fix those three

[jira] [Commented] (HBASE-14795) Enhance the spark-hbase scan operations

2015-12-15 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059502#comment-15059502 ] Zhan Zhang commented on HBASE-14795: My mistake. These warning are different from those three

[jira] [Created] (HBASE-14991) Fix the feature warning in scala code

2015-12-15 Thread Zhan Zhang (JIRA)
Zhan Zhang created HBASE-14991: -- Summary: Fix the feature warning in scala code Key: HBASE-14991 URL: https://issues.apache.org/jira/browse/HBASE-14991 Project: HBase Issue Type: Bug

[jira] [Created] (HBASE-14991) Fix the feature warning in scala code

2015-12-15 Thread Zhan Zhang (JIRA)
Zhan Zhang created HBASE-14991: -- Summary: Fix the feature warning in scala code Key: HBASE-14991 URL: https://issues.apache.org/jira/browse/HBASE-14991 Project: HBase Issue Type: Bug

[jira] [Commented] (HBASE-14795) Enhance the spark-hbase scan operations

2015-12-15 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059533#comment-15059533 ] Zhan Zhang commented on HBASE-14795: [~jmhsieh] HBASE-14991 is opened for this, and the patch

[jira] [Commented] (HBASE-14991) Fix the feature warning in scala code

2015-12-15 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059532#comment-15059532 ] Zhan Zhang commented on HBASE-14991: @Jonathan Hsieh Please review. > Fix the feature warn

[jira] [Commented] (HBASE-14991) Fix the feature warning in scala code

2015-12-15 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059531#comment-15059531 ] Zhan Zhang commented on HBASE-14991: Enable feature option and fix feature warning. >

[jira] [Updated] (HBASE-14991) Fix the feature warning in scala code

2015-12-15 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhan Zhang updated HBASE-14991: --- Status: Patch Available (was: Open) > Fix the feature warning in scala c

[jira] [Updated] (HBASE-14991) Fix the feature warning in scala code

2015-12-15 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhan Zhang updated HBASE-14991: --- Attachment: HBASE-14991.patch > Fix the feature warning in scala c

Re: Spark big rdd problem

2015-12-15 Thread Zhan Zhang
There are two cases here. If the container is killed by yarn, you can increase jvm overhead. Otherwise, you have to increase the executor-memory if there is no memory leak happening. Thanks. Zhan Zhang On Dec 15, 2015, at 9:58 PM, Eran Witkon <eranwit...@gmail.com<mailto:eranwit...@gma

[jira] [Commented] (HBASE-14849) Add option to set block cache to false on SparkSQL executions

2015-12-15 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059555#comment-15059555 ] Zhan Zhang commented on HBASE-14849: I use following command, but didn't find any javadoc warning

Re: Multi-core support per task in Spark

2015-12-11 Thread Zhan Zhang
I noticed that it is configurable in job level spark.task.cpus. Anyway to support on task level? Thanks. Zhan Zhang On Dec 11, 2015, at 10:46 AM, Zhan Zhang <zzh...@hortonworks.com> wrote: > Hi Folks, > > Is it possible to assign multiple core per task and how? Suppo

Re: Multi-core support per task in Spark

2015-12-11 Thread Zhan Zhang
I noticed that it is configurable in job level spark.task.cpus. Anyway to support on task level? Thanks. Zhan Zhang On Dec 11, 2015, at 10:46 AM, Zhan Zhang <zzh...@hortonworks.com> wrote: > Hi Folks, > > Is it possible to assign multiple core per task and how? Suppo

Re: What is the relationship between reduceByKey and spark.driver.maxResultSize?

2015-12-11 Thread Zhan Zhang
I think you are fetching too many results to the driver. Typically, it is not recommended to collect much data to driver. But if you have to, you can increase the driver memory, when submitting jobs. Thanks. Zhan Zhang On Dec 11, 2015, at 6:14 AM, Tom Seddon <mr.tom.sed...@gmail.

[jira] [Commented] (HBASE-14849) Add option to set block cache to false on SparkSQL executions

2015-12-11 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15053169#comment-15053169 ] Zhan Zhang commented on HBASE-14849: [~ted.m] Please feel free to assign to me. > Add option to

Multi-core support per task in Spark

2015-12-11 Thread Zhan Zhang
it make sense to add this feature. It may seems make user worry about more configuration, but by default we can still do 1 core per task and only advanced users need to be aware of this feature. Thanks. Zhan Zhang - To unsubscribe

Re: Performance does not increase as the number of workers increasing in cluster mode

2015-12-11 Thread Zhan Zhang
set if you wan tot do some performance benchmark. Thanks. Zhan Zhang On Dec 11, 2015, at 9:34 AM, Wei Da <xwd0...@qq.com<mailto:xwd0...@qq.com>> wrote: Hi, all I have done a test in different HW configurations of Spark 1.5.0. A KMeans algorithm has been ran in four dif

[jira] [Commented] (HBASE-14849) Add option to set block cache to false on SparkSQL executions

2015-12-11 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15053168#comment-15053168 ] Zhan Zhang commented on HBASE-14849: I suggest to put this type of configuration into SparkConf

Multi-core support per task in Spark

2015-12-11 Thread Zhan Zhang
it make sense to add this feature. It may seems make user worry about more configuration, but by default we can still do 1 core per task and only advanced users need to be aware of this feature. Thanks. Zhan Zhang - To unsubscribe

Re: how to access local file from Spark sc.textFile("file:///path to/myfile")

2015-12-11 Thread Zhan Zhang
As Sean mentioned, you cannot referring to the local file in your remote machine (executors). One walk around is to copy the file to all machines within same directory. Thanks. Zhan Zhang On Dec 11, 2015, at 10:26 AM, Lin, Hao <hao@finra.org<mailto:hao@finra.org&g

[jira] [Commented] (HBASE-14795) Enhance the spark-hbase scan operations

2015-12-10 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051856#comment-15051856 ] Zhan Zhang commented on HBASE-14795: [~ted.m] Thanks for reviewing this. I have updated

[jira] [Updated] (HBASE-14795) Enhance the spark-hbase scan operations

2015-12-10 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhan Zhang updated HBASE-14795: --- Attachment: HBASE-14795-4.patch > Enhance the spark-hbase scan operati

[jira] [Commented] (HBASE-14795) Enhance the spark-hbase scan operations

2015-12-10 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052024#comment-15052024 ] Zhan Zhang commented on HBASE-14795: Thanks [~ted.m] and [~ted_yu] for the help. [~ted.m]If you don't

[jira] [Updated] (HBASE-14795) Enhance the spark-hbase scan operations

2015-12-09 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhan Zhang updated HBASE-14795: --- Attachment: HBASE-14795-3.patch > Enhance the spark-hbase scan operati

[jira] [Updated] (HBASE-14795) Enhance the spark-hbase scan operations

2015-12-08 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhan Zhang updated HBASE-14795: --- Attachment: HBASE-14795-2.patch > Enhance the spark-hbase scan operati

[jira] [Commented] (HBASE-14795) Enhance the spark-hbase scan operations

2015-12-07 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046013#comment-15046013 ] Zhan Zhang commented on HBASE-14795: Thanks for reviewing it. I have updated the revised one

[jira] [Updated] (HBASE-14789) Enhance the current spark-hbase connector

2015-12-07 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhan Zhang updated HBASE-14789: --- Attachment: (was: HBASE-14795-1.patch) > Enhance the current spark-hbase connec

[jira] [Updated] (HBASE-14795) Enhance the spark-hbase scan operations

2015-12-07 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhan Zhang updated HBASE-14795: --- Attachment: HBASE-14795-1.patch solve review comments > Enhance the spark-hbase scan operati

[jira] [Updated] (HBASE-14789) Enhance the current spark-hbase connector

2015-12-07 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhan Zhang updated HBASE-14789: --- Attachment: HBASE-14795-1.patch solve review comments. > Enhance the current spark-hbase connec

[jira] [Commented] (HBASE-14795) Enhance the spark-hbase scan operations

2015-12-07 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046297#comment-15046297 ] Zhan Zhang commented on HBASE-14795: [~malaskat] I forget to publish it. It is available now. Sorry

[jira] [Commented] (HBASE-14795) Enhance the spark-hbase scan operations

2015-12-03 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15039729#comment-15039729 ] Zhan Zhang commented on HBASE-14795: Sure. I cannot submit review in review board, and will consult

[jira] [Updated] (HBASE-14795) Enhance the spark-hbase scan operations

2015-12-03 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhan Zhang updated HBASE-14795: --- Attachment: 0001-HBASE-14795-Enhance-the-spark-hbase-scan-operations.patch > Enhance the spark-hb

[jira] [Updated] (HBASE-14795) Enhance the spark-hbase scan operations

2015-12-03 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhan Zhang updated HBASE-14795: --- Status: Patch Available (was: Open) Initial patch to consolidate hbase-spark scan operations

[jira] [Commented] (HBASE-14795) Enhance the spark-hbase scan operations

2015-11-19 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014494#comment-15014494 ] Zhan Zhang commented on HBASE-14795: [~malaskat] The work is in progress. I may send out the PR after

Re: DataFrames initial jdbc loading - will it be utilizing a filter predicate?

2015-11-18 Thread Zhan Zhang
When you have following query, 'account=== “acct1” will be pushdown to generate new query with “where account = acct1” Thanks. Zhan Zhang On Nov 18, 2015, at 11:36 AM, Eran Medan <eran.me...@gmail.com<mailto:eran.me...@gmail.com>> wrote: I understand that the following ar

[jira] [Commented] (SPARK-11704) Optimize the Cartesian Join

2015-11-14 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15005682#comment-15005682 ] Zhan Zhang commented on SPARK-11704: [~maropu] You are right. I mean fetching from network is a big

[jira] [Commented] (SPARK-11705) Eliminate unnecessary Cartesian Join

2015-11-13 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004744#comment-15004744 ] Zhan Zhang commented on SPARK-11705: simple reproduce step: import sqlContext.implicits._ case class

[jira] [Comment Edited] (SPARK-11704) Optimize the Cartesian Join

2015-11-13 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15005136#comment-15005136 ] Zhan Zhang edited comment on SPARK-11704 at 11/14/15 5:16 AM: -- I think we

[jira] [Commented] (SPARK-11704) Optimize the Cartesian Join

2015-11-13 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15005134#comment-15005134 ] Zhan Zhang commented on SPARK-11704: [~maropu] Maybe I misunderstand. If RDD2 is coming from

[jira] [Commented] (SPARK-11704) Optimize the Cartesian Join

2015-11-13 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15005136#comment-15005136 ] Zhan Zhang commented on SPARK-11704: I think we can add a cleanup hook in SQLContext, and when

[jira] [Updated] (HBASE-14789) Enhance the current spark-hbase connector

2015-11-12 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhan Zhang updated HBASE-14789: --- Summary: Enhance the current spark-hbase connector (was: Provide an alternative spark-hbase

[jira] [Created] (HBASE-14801) Enhance the Spark-HBase connector catalog with json format

2015-11-12 Thread Zhan Zhang (JIRA)
Zhan Zhang created HBASE-14801: -- Summary: Enhance the Spark-HBase connector catalog with json format Key: HBASE-14801 URL: https://issues.apache.org/jira/browse/HBASE-14801 Project: HBase Issue

[jira] [Created] (HBASE-14801) Enhance the Spark-HBase connector catalog with json format

2015-11-12 Thread Zhan Zhang (JIRA)
Zhan Zhang created HBASE-14801: -- Summary: Enhance the Spark-HBase connector catalog with json format Key: HBASE-14801 URL: https://issues.apache.org/jira/browse/HBASE-14801 Project: HBase Issue

[jira] [Updated] (HBASE-14796) Enhance the Gets in the connector

2015-11-12 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhan Zhang updated HBASE-14796: --- Summary: Enhance the Gets in the connector (was: Provide an alternative spark-hbase SQL

[jira] [Updated] (HBASE-14795) Enhance the spark-hbase scan operations

2015-11-12 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhan Zhang updated HBASE-14795: --- Summary: Enhance the spark-hbase scan operations (was: Provide an alternative spark-hbase SQL

[jira] [Updated] (HBASE-14789) Enhance the current spark-hbase connector

2015-11-12 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HBASE-14789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhan Zhang updated HBASE-14789: --- Description: This JIRA is to optimize the RDD construction in the current connector implementation

[jira] [Created] (SPARK-11704) Optimize the Cartesian Join

2015-11-12 Thread Zhan Zhang (JIRA)
Zhan Zhang created SPARK-11704: -- Summary: Optimize the Cartesian Join Key: SPARK-11704 URL: https://issues.apache.org/jira/browse/SPARK-11704 Project: Spark Issue Type: Bug Components

[jira] [Updated] (SPARK-11704) Optimize the Cartesian Join

2015-11-12 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhan Zhang updated SPARK-11704: --- Issue Type: Improvement (was: Bug) > Optimize the Cartesian J

<    1   2   3   4   5   6   >