[
https://issues.apache.org/jira/browse/SPARK-4103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185889#comment-14185889
]
Zhan Zhang edited comment on SPARK-4103 at 10/27/14 11:10 PM
[
https://issues.apache.org/jira/browse/SPARK-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183452#comment-14183452
]
Zhan Zhang commented on SPARK-2706:
---
I just check the trunk, the change is already
[
https://issues.apache.org/jira/browse/SPARK-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183493#comment-14183493
]
Zhan Zhang commented on SPARK-2706:
---
Michael, Please ignore my last email. I thought
[
https://issues.apache.org/jira/browse/SPARK-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174073#comment-14174073
]
Zhan Zhang commented on SPARK-2706:
---
The code does not go to upstream yet. To build
[
https://issues.apache.org/jira/browse/SPARK-3720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhan Zhang updated SPARK-3720:
--
Attachment: orc.diff
This is the diff for orc file support. Because I also work on this item
[
https://issues.apache.org/jira/browse/SPARK-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhan Zhang updated SPARK-2883:
--
Attachment: orc.diff
Just for completion, I also attach the diff here besides spark-3720.
Spark
[
https://issues.apache.org/jira/browse/SPARK-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14173269#comment-14173269
]
Zhan Zhang commented on SPARK-2883:
---
It is scala patch, and I am not very familiar
[
https://issues.apache.org/jira/browse/SPARK-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171654#comment-14171654
]
Zhan Zhang edited comment on SPARK-2883 at 10/14/14 10:41 PM
[
https://issues.apache.org/jira/browse/SPARK-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171654#comment-14171654
]
Zhan Zhang commented on SPARK-2883:
---
I almost finished the prototype, and following
[
https://issues.apache.org/jira/browse/SPARK-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171654#comment-14171654
]
Zhan Zhang edited comment on SPARK-2883 at 10/14/14 10:44 PM
[
https://issues.apache.org/jira/browse/SPARK-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171654#comment-14171654
]
Zhan Zhang edited comment on SPARK-2883 at 10/14/14 11:04 PM
[
https://issues.apache.org/jira/browse/SPARK-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171654#comment-14171654
]
Zhan Zhang edited comment on SPARK-2883 at 10/14/14 11:59 PM
[
https://issues.apache.org/jira/browse/SPARK-3720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14164222#comment-14164222
]
Zhan Zhang commented on SPARK-3720:
---
There is another gira spark-2883 opened on 06/Aug
[
https://issues.apache.org/jira/browse/SPARK-3720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14164224#comment-14164224
]
Zhan Zhang commented on SPARK-3720:
---
By the way, I don't think the current approach can
[
https://issues.apache.org/jira/browse/SPARK-3720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14164222#comment-14164222
]
Zhan Zhang edited comment on SPARK-3720 at 10/8/14 9:45 PM
[
https://issues.apache.org/jira/browse/SPARK-3720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14164224#comment-14164224
]
Zhan Zhang edited comment on SPARK-3720 at 10/8/14 9:47 PM
[
https://issues.apache.org/jira/browse/SPARK-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14147939#comment-14147939
]
Zhan Zhang commented on SPARK-3633:
---
Increasing timeout does not help my case either. I
Try this
Import org.apache.spark.SparkContext._
Thanks.
Zhan Zhang
On Sep 24, 2014, at 6:13 AM, david david...@free.fr wrote:
thank's
i've already try this solution but it does not compile (in Eclipse)
I'm surprise to see that in Spark-shell, sortByKey works fine on 2
solutions
Here is my understanding
def takeOrdered(num: Int)(implicit ord: Ordering[T]): Array[T] = {
if (num == 0) { //if 0, return empty array
Array.empty
} else {
mapPartitions { items = //map each partition to a a new one
with the iterator consists of the single queue,
Definitely something wrong. For me, 10 to 30 minutes.
Thanks.
Zhan Zhang
On Sep 23, 2014, at 10:02 PM, christy 760948...@qq.com wrote:
This process began yesterday and it has already run for more than 20 hours.
Is it normal? Any one has the same problem? No error throw out yet
[
https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14139396#comment-14139396
]
Zhan Zhang commented on SPARK-1537:
---
Do you have any update on this, or any schedule
[
https://issues.apache.org/jira/browse/SPARK-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137759#comment-14137759
]
Zhan Zhang commented on SPARK-2883:
---
I am starting to prototyping the last feature
[
https://issues.apache.org/jira/browse/SPARK-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136102#comment-14136102
]
Zhan Zhang commented on SPARK-2883:
---
There are several features to be supported. 1st
[
https://issues.apache.org/jira/browse/SPARK-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136481#comment-14136481
]
Zhan Zhang commented on SPARK-2883:
---
Sorry, I mean column pruning. Currently what I see
[
https://issues.apache.org/jira/browse/SPARK-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhan Zhang updated SPARK-2883:
--
Comment: was deleted
(was: Sorry, I mean column pruning. Currently what I see is that HiveTableScan
Try this:
Import org.apache.spark.SparkContext._
Thanks.
Zhan Zhang
On Sep 4, 2014, at 4:36 PM, Veeranagouda Mukkanagoudar veera...@gmail.com
wrote:
I am planning to use RDD join operation, to test out i was trying to compile
some test code, but am getting following compilation error
://sandbox.hortonworks.com:8020/tmp/wordcount)
Thanks.
Zhan Zhang
On Aug 26, 2014, at 12:35 AM, motte1988 wir12...@studserv.uni-leipzig.de
wrote:
Hello,
it's me again.
Now I've got an explanation for the behaviour. It seems that the driver
memory is not large enough to hold the whole result set
[
https://issues.apache.org/jira/browse/SPARK-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14118979#comment-14118979
]
Zhan Zhang commented on SPARK-2706:
---
send out pull request https://github.com/apache
issue to spark-2706 soon.
Thanks.
Zhan Zhang
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/Working-Formula-for-Hive-0-13-tp7551p8118.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com
[
https://issues.apache.org/jira/browse/SPARK-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhan Zhang updated SPARK-2706:
--
Attachment: v1.0.2.diff
This is the patch against v1.0.2. I didn't fix the test cases. The regular
I think it depends on your job. My personal experiences when I run TB data.
spark got loss connection failure if I use big JVM with large memory, but with
more executors with small memory, it can run very smoothly. I was running spark
on yarn.
Thanks.
Zhan Zhang
On Aug 21, 2014, at 3:42 PM
[
https://issues.apache.org/jira/browse/HIVE-4523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105103#comment-14105103
]
Zhan Zhang commented on HIVE-4523:
--
I also met the same problem with the new round UDF
[
https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104881#comment-14104881
]
Zhan Zhang commented on SPARK-1537:
---
Thanks for sharing this. Do you have concrete plan
the reduceByKey because it is not cached.
I agree with you it is very confusing.
Thanks.
Zhan Zhang
The f
On Aug 20, 2014, at 2:28 PM, Patrick Wendell pwend...@gmail.com wrote:
The reason is that some operators get pipelined into a single stage.
rdd.map(XX).filter(YY) - this executes in a single
( )).map(word = (word,
1)).reduceByKey((a, b) = a + b)
counts.saveAsTextFile(“file”)//any way you don’t want to collect results to
master, and instead putting them in file.
Thanks.
Zhan Zhang
On Aug 16, 2014, at 9:18 AM, Jerry Ye jerr...@gmail.com wrote:
The job ended up running overnight
Not sure exactly how you use it. My understanding is that in spark it would be
better to keep the overhead of driver as less as possible. Is it possible to
broadcast trie to executors, do computation there and then aggregate the
counters (??) in reduct phase?
Thanks.
Zhan Zhang
On Aug 18
String HBASE_TABLE_NAME = hbase.table.name”;
Thanks.
Zhan Zhang
On Aug 17, 2014, at 11:39 PM, Cesar Arevalo ce...@zephyrhealthinc.com wrote:
HadoopRDD
--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain
.
Zhan Zhang
On Aug 18, 2014, at 11:26 AM, Peng Cheng pc...@uow.edu.au wrote:
I'm curious to see that if you declare broadcasted wrapper as a var, and
overwrite it in the driver program, the modification can have stable impact
on all transformations/actions defined BEFORE the overwrite
[
https://issues.apache.org/jira/browse/SPARK-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099275#comment-14099275
]
Zhan Zhang commented on SPARK-2883:
---
Spark with Hive12 can operate Orc table through
Zhan Zhang created HIVE-7743:
Summary: appendReadColumns should have sanity check
Key: HIVE-7743
URL: https://issues.apache.org/jira/browse/HIVE-7743
Project: Hive
Issue Type: Bug
I tried with simple spark-hive select and insert, and it works. But to directly
manipulate the ORCFile through RDD, spark has to be upgraded to support
hive-0.13 first. Because some ORC API is not exposed until Hive-0.12.
Thanks.
Zhan Zhang
On Aug 11, 2014, at 10:23 PM, vinay.kash
Yes. You are right, but I tried old hadoopFile for OrcInputFormat. In hive12,
OrcStruct is not exposing its api, so spark cannot access it. With Hive13, RDD
can read from OrcFile. Btw, I didn’t see ORCNewOutputFormat in hive-0.13.
Direct RDD manipulation (Hive13)
val inputRead =
I agree. We need the support similar to parquet file for end user. That’s the
purpose of Spark-2883.
Thanks.
Zhan Zhang
On Aug 14, 2014, at 11:42 AM, Yin Huai huaiyin@gmail.com wrote:
I feel that using hadoopFile and saveAsHadoopFile to read and write ORCFile
are more towards
[
https://issues.apache.org/jira/browse/SPARK-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094810#comment-14094810
]
Zhan Zhang commented on SPARK-2706:
---
It may caused by hive side change, in the TestHive
Problem solved by a walkaround with create database and use database.
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-testsuite-error-for-hive-0-13-tp7807p7819.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
I am trying to change spark to support hive-0.13, but always met following
problem when running the test. My feeling is the test setup may need to
change, but don't know exactly. Who has the similar issue or is able to shed
light on it?
13:50:53.331 ERROR org.apache.hadoop.hive.ql.Driver: FAILED:
Thanks Sean,
I change both the API and version because there are some incompatibility
with hive-0.13, and actually can do some basic operation with the real hive
environment. But the test suite always complain with no default database
message. No clue yet.
--
View this message in context:
[
https://issues.apache.org/jira/browse/SPARK-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhan Zhang updated SPARK-2706:
--
Attachment: hive.diff
mvn -Phive -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean package
[
https://issues.apache.org/jira/browse/SPARK-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhan Zhang updated SPARK-2706:
--
Attachment: (was: hive.diff)
Enable Spark to support Hive 0.13
[
https://issues.apache.org/jira/browse/SPARK-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhan Zhang updated SPARK-2706:
--
Attachment: hive.diff
Patch to the latest spark trunk.
I only test with following compilation
mvn
[
https://issues.apache.org/jira/browse/SPARK-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhan Zhang updated SPARK-2706:
--
Comment: was deleted
(was: mvn -Phive -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean
The API change seems not major. I have locally change it and compiled, but
not test yet. The major problem is still how to solve the hive-exec jar
dependency. I am willing to help on this issue. Is it better stick to the
same way as hive-0.12 until hive-exec is cleaned enough to switch back?
--
I can compile with no error, but my patch also includes other stuff.
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/Working-Formula-for-Hive-0-13-tp7551p7775.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
Here is the patch. Please ignore the pom.xml related change, which just for
compiling purpose. I need to further work on this one based on Wandou's
previous work.
--
View this message in context:
Sorry, forget to upload files. I have never posted before :) hive.diff
http://apache-spark-developers-list.1001551.n3.nabble.com/file/n/hive.diff
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/Working-Formula-for-Hive-0-13-tp7551p.html
Attached the diff the PR SPARK-2706. I am currently working on this problem.
If somebody are also working on this, we can share the load.
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/Working-Formula-for-Hive-0-13-tp7551p7782.html
Sent from the
Zhan Zhang created SPARK-2883:
-
Summary: Spark Support for ORCFile format
Key: SPARK-2883
URL: https://issues.apache.org/jira/browse/SPARK-2883
Project: Spark
Issue Type: Bug
[
https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14086657#comment-14086657
]
Zhan Zhang commented on SPARK-1537:
---
I am also interested in it and trying to integrate
[
https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14086808#comment-14086808
]
Zhan Zhang commented on SPARK-1537:
---
Do you mind sharing your thoughts, design document
Thanks a lot.
By spins up, do you mean using the same directory, specified by following?
/** Local directory to save .class files too */
val outputDir = {
val tmp = System.getProperty(java.io.tmpdir)
val rootDir = new SparkConf().get(spark.repl.classdir, tmp)
501 - 560 of 560 matches
Mail list logo