Thanks all for the quick response.
Thanks.
Zhan Zhang
On Mar 26, 2015, at 3:14 PM, Patrick Wendell pwend...@gmail.com wrote:
I think we have a version of mapPartitions that allows you to tell
Spark the partitioning is preserved:
https://github.com/apache/spark/blob/master/core/src/main
I solve this by increase the PermGen memory size in driver.
-XX:MaxPermSize=512m
Thanks.
Zhan Zhang
On Mar 25, 2015, at 10:54 AM, ÐΞ€ρ@Ҝ (๏̯͡๏)
deepuj...@gmail.commailto:deepuj...@gmail.com wrote:
I am facing same issue, posted a new thread. Please respond.
On Wed, Jan 14, 2015 at 4:38 AM
You can do it in $SPARK_HOME/conf/spark-defaults.con
spark.driver.extraJavaOptions -XX:MaxPermSize=512m
Thanks.
Zhan Zhang
On Mar 25, 2015, at 7:25 PM, ÐΞ€ρ@Ҝ (๏̯͡๏)
deepuj...@gmail.commailto:deepuj...@gmail.com wrote:
Where and how do i pass this or other JVM argument ?
-XX:MaxPermSize
You can try to set it in spark-env.sh.
# - SPARK_LOG_DIR Where log files are stored. (Default:
${SPARK_HOME}/logs)
# - SPARK_PID_DIR Where the pid file is stored. (Default: /tmp)
Thanks.
Zhan Zhang
On Mar 24, 2015, at 12:10 PM, Anubhav Agarwal
anubha...@gmail.commailto:anubha
You can try to set it in spark-env.sh.
# - SPARK_LOG_DIR Where log files are stored. (Default:
${SPARK_HOME}/logs)
# - SPARK_PID_DIR Where the pid file is stored. (Default: /tmp)
Thanks.
Zhan Zhang
On Mar 24, 2015, at 12:10 PM, Anubhav Agarwal
anubha...@gmail.commailto:anubha
[
https://issues.apache.org/jira/browse/SPARK-3720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14376361#comment-14376361
]
Zhan Zhang commented on SPARK-3720:
---
[~iward] Since this jiar is duplicated to Spark
Probably the port is already used by others, e.g., hive. You can change the
port similar to below
./sbin/start-thriftserver.sh --master yarn --executor-memory 512m --hiveconf
hive.server2.thrift.port=10001
Thanks.
Zhan Zhang
On Mar 23, 2015, at 12:01 PM, Neil Dev
neilk
[
https://issues.apache.org/jira/browse/SPARK-6112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhan Zhang updated SPARK-6112:
--
Attachment: SparkOffheapsupportbyHDFS.pdf
Design doc for hdfs offheap support
Provide OffHeap support
[
https://issues.apache.org/jira/browse/SPARK-6479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhan Zhang updated SPARK-6479:
--
Attachment: SparkOffheapsupportbyHDFS.pdf
The design doc also includes stuff from SPARK-6112
Create
Thanks Reynold,
Agree with you to open another JIRA to unify the block storage API. I have
upload the design doc to SPARK-6479 as well.
Thanks.
Zhan Zhang
On Mar 23, 2015, at 4:03 PM, Reynold Xin
r...@databricks.commailto:r...@databricks.com wrote:
I created a ticket to separate the API
[
https://issues.apache.org/jira/browse/SPARK-6479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14376978#comment-14376978
]
Zhan Zhang commented on SPARK-6479:
---
The current API may not be good enough as it has
Hi Patcharee,
It is an alpha feature in HDP distribution, integrating ATS with Spark history
server. If you are using upstream, you can configure spark as regular without
these configuration. But other related configuration are still mandatory, such
as hdp.version related.
Thanks.
Zhan Zhang
[
https://issues.apache.org/jira/browse/SPARK-6112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhan Zhang updated SPARK-6112:
--
Summary: Provide OffHeap support through HDFS RAM_DISK (was: Leverage HDFS
RAM_DISK capacity
Each RDD has multiple partitions, each of them will produce one hdfs file when
saving output. I don’t think you are allowed to have multiple file handler
writing to the same hdfs file. You still can load multiple files into hive
tables, right?
Thanks..
Zhan Zhang
On Mar 15, 2015, at 7:31 AM
It is during function evaluation in the line search, the value is either
infinite or NaN, which may be caused too large step size. In the code, the step
is reduced to half.
Thanks.
Zhan Zhang
On Mar 13, 2015, at 2:41 PM, cjwang c...@cjwang.us wrote:
I am running LogisticRegressionWithLBFGS
one partition.
iterPartition += 1
}
You can refer RDD.take for example.
Thanks.
Zhan Zhang
On Mar 9, 2015, at 3:41 PM, Shuai Zheng
szheng.c...@gmail.commailto:szheng.c...@gmail.com wrote:
Hi All,
I am processing some time series data. For one day, it might has 500GB, then
for each hour
Do you mean “--hiveConf” (two dash) , instead of -hiveconf (one dash)
Thanks.
Zhan Zhang
On Mar 6, 2015, at 4:20 AM, James alcaid1...@gmail.com wrote:
Hello,
I want to execute a hql script through `spark-sql` command, my script
contains:
```
ALTER TABLE xxx
DROP PARTITION
the link to see why the shell
failed in the first place.
Thanks.
Zhan Zhang
On Mar 6, 2015, at 9:59 AM, Todd Nist
tsind...@gmail.commailto:tsind...@gmail.com wrote:
First, thanks to everyone for their assistance and recommendations.
@Marcelo
I applied the patch that you recommended and am now able
Sorry. Misunderstanding. Looks like it already worked. If you still met some
hdp.version problem, you can try it :)
Thanks.
Zhan Zhang
On Mar 6, 2015, at 11:40 AM, Zhan Zhang
zzh...@hortonworks.commailto:zzh...@hortonworks.com wrote:
You are using 1.2.1 right? If so, please add java-opts
You are using 1.2.1 right? If so, please add java-opts in conf directory and
give it a try.
[root@c6401 conf]# more java-opts
-Dhdp.version=2.2.2.0-2041
Thanks.
Zhan Zhang
On Mar 6, 2015, at 11:35 AM, Todd Nist
tsind...@gmail.commailto:tsind...@gmail.com wrote:
-Dhdp.version=2.2.0.0
/
Thanks.
Zhan Zhang
On Mar 5, 2015, at 11:09 AM, Marcelo Vanzin
van...@cloudera.commailto:van...@cloudera.com wrote:
It seems from the excerpt below that your cluster is set up to use the
Yarn ATS, and the code is failing in that path. I think you'll need to
apply the following patch to your
Zhan Zhang created AMBARI-9952:
--
Summary: Populate keytab to all spark components
Key: AMBARI-9952
URL: https://issues.apache.org/jira/browse/AMBARI-9952
Project: Ambari
Issue Type: Bug
It use HashPartitioner to distribute the record to different partitions, but
the key is just integer evenly across output partitions.
From the code, each resulting partition will get very similar number of
records.
Thanks.
Zhan Zhang
On Mar 4, 2015, at 3:47 PM, Du Li
l...@yahoo
: org.apache.spark.sql.SchemaRDD =
SchemaRDD[3] at RDD at SchemaRDD.scala:108
== Query Plan ==
== Physical Plan ==
Filter Contains(value#5, Restaurant)
HiveTableScan [key#4,value#5], (MetastoreRelation default, testtable, None),
None
scala
Thanks.
Zhan Zhang
On Mar 4, 2015, at 9:09 AM, Anusha Shamanur
anushas
Do you have enough resource in your cluster? You can check your resource
manager to see the usage.
Thanks.
Zhan Zhang
On Mar 3, 2015, at 8:51 AM, abhi
abhishek...@gmail.commailto:abhishek...@gmail.com wrote:
I am trying to run below java class with yarn cluster, but it hangs in accepted
In Yarn (Cluster or client), you can access the spark ui when the app is
running. After app is done, you can still access it, but need some extra setup
for history server.
Thanks.
Zhan Zhang
On Mar 3, 2015, at 10:08 AM, Ted Yu
yuzhih...@gmail.commailto:yuzhih...@gmail.com wrote:
bq
[
https://issues.apache.org/jira/browse/SPARK-6112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14343808#comment-14343808
]
Zhan Zhang commented on SPARK-6112:
---
Will start scoping it.
Leverage HDFS RAM_DISK
[
https://issues.apache.org/jira/browse/SPARK-6112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhan Zhang updated SPARK-6112:
--
Component/s: (was: Spark Core)
Block Manager
Leverage HDFS RAM_DISK capacity
Zhan Zhang created SPARK-6112:
-
Summary: Leverage HDFS RAM_DISK capacity to provide off_heap
feature similar to Tachyon
Key: SPARK-6112
URL: https://issues.apache.org/jira/browse/SPARK-6112
Project
[
https://issues.apache.org/jira/browse/SPARK-6112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhan Zhang updated SPARK-6112:
--
Component/s: Spark Core
Leverage HDFS RAM_DISK capacity to provide off_heap feature similar
[
https://issues.apache.org/jira/browse/SPARK-6112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhan Zhang updated SPARK-6112:
--
Component/s: (was: YARN)
Leverage HDFS RAM_DISK capacity to provide off_heap feature similar
[
https://issues.apache.org/jira/browse/SPARK-6112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhan Zhang updated SPARK-6112:
--
Description:
HDFS Lazy_Persist policy provide possibility to cache the RDD off_heap in hdfs.
We may
[
https://issues.apache.org/jira/browse/SPARK-6112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhan Zhang updated SPARK-6112:
--
Description:
HDFS Lazy_Persist policy provide possibility to cache the RDD off_heap in hdfs.
We may
You don’t need to know rdd dependencies to maximize dependencies. Internally
the scheduler will construct the DAG and trigger the execution if there is no
shuffle dependencies in between RDDs.
Thanks.
Zhan Zhang
On Feb 26, 2015, at 1:28 PM, Corey Nolet cjno...@gmail.com wrote:
Let's say I'm
What confused me is the statement of The final result is that rdd1 is
calculated twice.” Is it the expected behavior?
Thanks.
Zhan Zhang
On Feb 26, 2015, at 3:03 PM, Sean Owen
so...@cloudera.commailto:so...@cloudera.com wrote:
To distill this a bit further, I don't think you actually want
.saveAsHadoopFile(…)]
In this way, rdd1 will be calculated once, and two saveAsHadoopFile will happen
concurrently.
Thanks.
Zhan Zhang
On Feb 26, 2015, at 3:28 PM, Corey Nolet
cjno...@gmail.commailto:cjno...@gmail.com wrote:
What confused me is the statement of The final result is that rdd1
cores sitting idle.
OOM: increase the memory size, and JVM memory overhead may help here.
Thanks.
Zhan Zhang
On Feb 26, 2015, at 2:03 PM, Yana Kadiyska
yana.kadiy...@gmail.commailto:yana.kadiy...@gmail.com wrote:
Imran, I have also observed the phenomenon of reducing the cores helping
When you use sql (or API from SchemaRDD/DataFrame) to read data form parquet,
the optimizer will do column pruning, predictor pushdown, etc. Thus you can
the benefit of parquet column benefits. After that, you can operate the
SchemaRDD (DF) like regular RDD.
Thanks.
Zhan Zhang
On Feb 26
Currently in spark, it looks like there is no easy way to know the
dependencies. It is solved at run time.
Thanks.
Zhan Zhang
On Feb 26, 2015, at 4:20 PM, Corey Nolet
cjno...@gmail.commailto:cjno...@gmail.com wrote:
Ted. That one I know. It was the dependency part I was curious about
On Feb
context initiate
YarnClusterSchedulerBackend instead of YarnClientSchedulerBackend, which I
think is the root cause.
Thanks.
Zhan Zhang
On Feb 25, 2015, at 1:53 PM, Zhan Zhang
zzh...@hortonworks.commailto:zzh...@hortonworks.com wrote:
Hi Mate,
When you initialize the JavaSparkContext, you don’t
[
https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329664#comment-14329664
]
Zhan Zhang commented on SPARK-1537:
---
[~vanzin] If you don't have bandwidth, or don't
[
https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329649#comment-14329649
]
Zhan Zhang commented on SPARK-1537:
---
[~vanzin] Thanks for the comments. I don't
[
https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329678#comment-14329678
]
Zhan Zhang commented on SPARK-1537:
---
[~vanzin] I declare integrate your code from
[
https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329649#comment-14329649
]
Zhan Zhang edited comment on SPARK-1537 at 2/20/15 10:14 PM
[
https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329700#comment-14329700
]
Zhan Zhang commented on SPARK-1537:
---
[~sowen] From the whole context, I believe you
[
https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhan Zhang updated SPARK-1537:
--
Comment: was deleted
(was: [~sowen] By the way, I am not waiting for someone to give me the patch
[
https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329704#comment-14329704
]
Zhan Zhang commented on SPARK-1537:
---
[~sowen] By the way, I am not waiting for someone
[
https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329828#comment-14329828
]
Zhan Zhang commented on SPARK-1537:
---
[~sowen] In JIRA, we share the code so that other
[
https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhan Zhang updated SPARK-1537:
--
Attachment: SPARK-1537.txt
High level design doc for spark ATS integration.
Add integration
[
https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhan Zhang updated SPARK-1537:
--
Attachment: spark-1573.patch
Patch against v1.2.1
Add integration with Yarn's Application Timeline
[
https://issues.apache.org/jira/browse/SPARK-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326443#comment-14326443
]
Zhan Zhang commented on SPARK-5889:
---
https://github.com/apache/spark/pull/4676
remove
Zhan Zhang created SPARK-5889:
-
Summary: remove pid file in spark-daemon.sh after killing the
process.
Key: SPARK-5889
URL: https://issues.apache.org/jira/browse/SPARK-5889
Project: Spark
Issue
[
https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326778#comment-14326778
]
Zhan Zhang commented on SPARK-1537:
---
I have sent a PR with WIP for people who
When you log in, you have root access. Then you can do “su hdfs” or any other
account. Then you can create hdfs directory and change permission, etc.
Thanks
Zhan Zhang
On Feb 11, 2015, at 11:28 PM, guxiaobo1982
guxiaobo1...@qq.commailto:guxiaobo1...@qq.com wrote:
Hi Zhan,
Yes, I found
You need to have right hdfs account, e.g., hdfs, to create directory and
assign permission.
Thanks.
Zhan Zhang
On Feb 11, 2015, at 4:34 AM, guxiaobo1982
guxiaobo1...@qq.commailto:guxiaobo1...@qq.com wrote:
Hi Zhan,
My Single Node Cluster of Hadoop is installed by Ambari 1.7.0, I tried
Zhan Zhang created AMBARI-9583:
--
Summary: Add kerberos support for spark
Key: AMBARI-9583
URL: https://issues.apache.org/jira/browse/AMBARI-9583
Project: Ambari
Issue Type: Bug
[
https://issues.apache.org/jira/browse/AMBARI-9583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhan Zhang updated AMBARI-9583:
---
Attachment: Ambari-9583.patch
patch for kerberos support
Add kerberos support for spark
[
https://issues.apache.org/jira/browse/AMBARI-9583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317191#comment-14317191
]
Zhan Zhang commented on AMBARI-9583:
ReviewBoard: https://reviews.apache.org/r/30896
[
https://issues.apache.org/jira/browse/AMBARI-9583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhan Zhang updated AMBARI-9583:
---
Attachment: 0001-spark-kerberos-support.patch
Change the patch with right format
Add kerberos
Yes. You need to create xiaobogu under /user and provide right permission to
xiaobogu.
Thanks.
Zhan Zhang
On Feb 7, 2015, at 8:15 AM, guxiaobo1982
guxiaobo1...@qq.commailto:guxiaobo1...@qq.com wrote:
Hi Zhan Zhang,
With the pre-bulit version 1.2.0 of spark against the yarn cluster installed
Not sure spark standalone mode. But on spark-on-yarn, it should work. You can
check following link:
http://hortonworks.com/hadoop-tutorial/using-apache-spark-hdp/
Thanks.
Zhan Zhang
On Feb 5, 2015, at 5:02 PM, Cheng Lian
lian.cs@gmail.commailto:lian.cs@gmail.com wrote:
Please note
Congratulations!
On Feb 3, 2015, at 2:34 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
Hi all,
The PMC recently voted to add three new committers: Cheng Lian, Joseph
Bradley and Sean Owen. All three have been major contributors to Spark in the
past year: Cheng on Spark SQL, Joseph on
I think you can configure hadoop/hive to do impersonation. There is no
difference between secure or insecure hadoop cluster by using kinit.
Thanks.
Zhan Zhang
On Feb 2, 2015, at 9:32 PM, Koert Kuipers
ko...@tresata.commailto:ko...@tresata.com wrote:
yes jobs run as the user that launched
You are running yarn-client mode. How about increase the --driver-memory and
give it a try?
Thanks.
Zhan Zhang
On Jan 29, 2015, at 6:36 PM, QiuxuanZhu
ilsh1...@gmail.commailto:ilsh1...@gmail.com wrote:
Dear all,
I have no idea when it raises an error when I run the following code.
def
I think it is expected. Refer to the comments in saveAsTable Note that this
currently only works with SchemaRDDs that are created from a HiveContext”. If I
understand correctly, here the SchemaRDD means those generated by
HiveContext.sql, instead of applySchema.
Thanks.
Zhan Zhang
On Jan 29
You can put hive-site.xml in your conf/ directory. It will connect to Hive when
HiveContext is initialized.
Thanks.
Zhan Zhang
On Jan 21, 2015, at 12:35 PM, YaoPau jonrgr...@gmail.com wrote:
Is this possible, and if so what steps do I need to take to make this happen?
--
View
You can try to add it in in conf/spark-defaults.conf
# spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value
-Dnumbers=one two three”
Thanks.
Zhan Zhang
On Jan 16, 2015, at 9:56 AM, Michel Dufresne sparkhealthanalyt...@gmail.com
wrote:
Hi All,
I'm trying to set some JVM
Hi Folks,
I am trying to run hive context in yarn-cluster mode, but met some error. Does
anybody know what cause the issue.
I use following cmd to build the distribution:
./make-distribution.sh -Phive -Phive-thriftserver -Pyarn -Phadoop-2.4
15/01/13 17:59:42 INFO
Zhan Zhang created SPARK-5110:
-
Summary: Spark-on-Yarn does not work on windows platform
Key: SPARK-5110
URL: https://issues.apache.org/jira/browse/SPARK-5110
Project: Spark
Issue Type: Bug
Zhan Zhang created SPARK-5111:
-
Summary: HiveContext and Thriftserver cannot work in secure
cluster beyond hadoop2.5
Key: SPARK-5111
URL: https://issues.apache.org/jira/browse/SPARK-5111
Project: Spark
[
https://issues.apache.org/jira/browse/SPARK-5108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhan Zhang updated SPARK-5108:
--
Summary: Need to make jackson dependency version consistent with
hadoop-2.6.0. (was: Need to add more
[
https://issues.apache.org/jira/browse/SPARK-5110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266744#comment-14266744
]
Zhan Zhang commented on SPARK-5110:
---
You are right. I will make this duplicated.
Spark
[
https://issues.apache.org/jira/browse/SPARK-5110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhan Zhang closed SPARK-5110.
-
Resolution: Duplicate
Spark-on-Yarn does not work on windows platform
Zhan Zhang created SPARK-5108:
-
Summary: Need to add more jackson dependency for hadoop-2.6.0
support.
Key: SPARK-5108
URL: https://issues.apache.org/jira/browse/SPARK-5108
Project: Spark
Issue
[
https://issues.apache.org/jira/browse/SPARK-5108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266529#comment-14266529
]
Zhan Zhang commented on SPARK-5108:
---
[~sowen] You are right.
Need to add more
[
https://issues.apache.org/jira/browse/SPARK-5108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266530#comment-14266530
]
Zhan Zhang commented on SPARK-5108:
---
java.lang.NoSuchMethodError
I think it is overflow. The training data is quite big. The algorithms
scalability highly depends on the vocabSize. Even without overflow, there are
still other bottlenecks, for example, syn0Global and syn1Global, each of them
has vocabSize * vectorSize elements.
Thanks.
Zhan Zhang
On Jan
Hi Manas,
There is a small patch needed for HDP2.2. You can refer to this PR
https://github.com/apache/spark/pull/3409
There are some other issues compiling against hadoop2.6. But we will fully
support it very soon. You can ping me, if you want.
Thanks.
Zhan Zhang
On Dec 12, 2014, at 11:38
Please check whether
https://github.com/apache/spark/pull/3409#issuecomment-64045677 solve the
problem for launching AM.
Thanks.
Zhan Zhang
On Dec 1, 2014, at 4:49 PM, Mohammad Islam misla...@yahoo.com.INVALID wrote:
Hi,
How to pass the Java options (such as -XX:MaxMetaspaceSize=100M) when
some basic functions using hive-0.13 connect to
hive-0.14 metastore, and it looks like they are compatible.
Thanks.
Zhan Zhang
On Nov 22, 2014, at 7:14 AM, Cheng Lian lian.cs@gmail.com wrote:
Should emphasize that this is still a quick and rough conclusion, will
investigate
[
https://issues.apache.org/jira/browse/SPARK-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14221474#comment-14221474
]
Zhan Zhang commented on SPARK-4461:
---
I changed the title so that to reflect the issue
[
https://issues.apache.org/jira/browse/SPARK-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14221502#comment-14221502
]
Zhan Zhang commented on SPARK-4461:
---
Thanks for the information Marcelo. I changed
[
https://issues.apache.org/jira/browse/SPARK-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14221502#comment-14221502
]
Zhan Zhang edited comment on SPARK-4461 at 11/21/14 10:23 PM
on hive, e.g., metastore, thriftserver,
hcatlog may not be able to help much.
Does anyone have any insight or idea in mind?
Thanks.
Zhan Zhang
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/How-spark-and-hive-integrate-in-long-term-tp9482.html
Sent
and more features added, it would be great if user
can take advantage of both. Current, spark sql give us such benefits partially,
but I am wondering how to keep such integration in long term.
Thanks.
Zhan Zhang
On Nov 21, 2014, at 3:12 PM, Dean Wampler deanwamp...@gmail.com wrote:
I can't comment
Zhan Zhang created SPARK-4461:
-
Summary: Spark should not relies on mapred-site.xml for classpath
Key: SPARK-4461
URL: https://issues.apache.org/jira/browse/SPARK-4461
Project: Spark
Issue Type
[
https://issues.apache.org/jira/browse/SPARK-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhan Zhang updated SPARK-4461:
--
Description:
Currently spark read mapred-site.xml to get the class path. From hadoop 2.6,
the library
. You can refer
to https://github.com/apache/spark/pull/2685 for the whole story.
Thanks.
Zhan Zhang
Thanks.
Zhan Zhang
On Nov 5, 2014, at 4:47 PM, Cheng, Hao hao.ch...@intel.com wrote:
Hi, all, I noticed that when compiling the SparkSQL with profile
“hive-0.13.1”, it will fetch the Hive
[
https://issues.apache.org/jira/browse/SPARK-3720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195402#comment-14195402
]
Zhan Zhang commented on SPARK-3720:
---
[~neoword] wangfei and me are working together
[
https://issues.apache.org/jira/browse/SPARK-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195427#comment-14195427
]
Zhan Zhang commented on SPARK-2883:
---
[~neoword] , As [~marmbrus] mentioned, the PR need
[
https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191052#comment-14191052
]
Zhan Zhang commented on SPARK-1537:
---
Yarn-2521 can make client easier to use
] = {
sc.runJob(this, (iter: Iterator[T]) = iter.toArray, Seq(p), allowLocal =
false).head
}
(0 until partitions.length).iterator.flatMap(i = collectPartition(i))
}
Thanks.
Zhan Zhang
On Oct 29, 2014, at 3:43 AM, Yanbo Liang yanboha...@gmail.com wrote:
RDD.toLocalIterator
[
https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189651#comment-14189651
]
Zhan Zhang commented on SPARK-1537:
---
Hi Marcelo,
Do you have update on this? If you
-Phive is to enable hive-0.13.1 and -Phive -Phive-0.12.0” is to enable
hive-0.12.0. Note that the thrift-server is not supported yet in hive-0.13, but
expected to go to upstream soon (Spark-3720).
Thanks.
Zhan Zhang
On Oct 28, 2014, at 9:09 PM, Stephen Boesch java...@gmail.com wrote
You can set your executor number with --num-executors. Also changing
yarn-client save you one container for driver. Then check your yarn resource
manager to make sure there are more containers available to serve your extra
apps.
Thanks.
Zhan Zhang
On Oct 28, 2014, at 5:31 PM, Soumya Simanta
I think it is already lazily computed, or do you mean something else? Following
is the signature of compute in RDD
def compute(split: Partition, context: TaskContext): Iterator[T]
Thanks.
Zhan Zhang
On Oct 28, 2014, at 8:15 PM, Dai, Kevin yun...@ebay.com wrote:
Hi, ALL
I have a RDD[T
Can you use row(i).asInstanceOf[]
Thanks.
Zhan Zhang
On Oct 28, 2014, at 5:03 PM, Mohammed Guller moham...@glassbeam.com wrote:
Hi –
The Spark SQL Row class has methods such as getInt, getLong, getBoolean,
getFloat, getDouble, etc. However, I don’t see a getDate method. So how can
Zhan Zhang created SPARK-4103:
-
Summary: Clean up SessionState in HiveContext
Key: SPARK-4103
URL: https://issues.apache.org/jira/browse/SPARK-4103
Project: Spark
Issue Type: Bug
[
https://issues.apache.org/jira/browse/SPARK-4103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185889#comment-14185889
]
Zhan Zhang commented on SPARK-4103:
---
There are already some efforts (Spark-4037
[
https://issues.apache.org/jira/browse/SPARK-4103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185889#comment-14185889
]
Zhan Zhang edited comment on SPARK-4103 at 10/27/14 10:56 PM
401 - 500 of 560 matches
Mail list logo