.
From: Todd [mailto:bit1...@163.com]
Sent: Friday, September 11, 2015 2:17 PM
To: Cheng, Hao
Cc: Jesse F Chen; Michael Armbrust; user@spark.apache.org
Subject: Re:RE: spark 1.5 SQL slows down dramatically by 50%+ compared with
spark 1.4.1 SQL
Thanks Hao for the reply.
I turn the merge sort join off
, September 11, 2015 3:39 PM
To: Todd
Cc: Cheng, Hao; Jesse F Chen; Michael Armbrust; user@spark.apache.org
Subject: Re:Re:RE: Re:RE: spark 1.5 SQL slows down dramatically by 50%+
compared with spark 1.4.1 SQL
I add the following two options:
spark.sql.planner.sortMergeJoin=false
This is not a big surprise the SMJ is slower than the HashJoin, as we do not
fully utilize the sorting yet, more details can be found at
https://issues.apache.org/jira/browse/SPARK-2926 .
Anyway, can you disable the sort merge join by
“spark.sql.planner.sortMergeJoin=false;” in Spark 1.5, and
Will that be helpful if adding jvm options like:
-XX:+CMSClassUnloadingEnabled -XX:+CMSPermGenSweepingEnabled
From: Reynold Xin [mailto:r...@databricks.com]
Sent: Thursday, September 10, 2015 5:31 AM
To: Sandy Ryza
Cc: user@spark.apache.org
Subject: Re: Driver OOM after upgrading to 1.5
It's
[
https://issues.apache.org/jira/browse/SPARK-10484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734395#comment-14734395
]
Cheng Hao commented on SPARK-10484:
---
In cartesian produce implementation, there is 2 level nested loops
[
https://issues.apache.org/jira/browse/SPARK-10466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14736016#comment-14736016
]
Cheng Hao commented on SPARK-10466:
---
Sorry, [~davies], I found the spark conf doens't take effect when
Cheng Hao created SPARK-10466:
-
Summary: UnsafeRow exception in Sort-Based Shuffle with data spill
Key: SPARK-10466
URL: https://issues.apache.org/jira/browse/SPARK-10466
Project: Spark
Issue
Not sure if it’s too late, but we found a critical bug at
https://issues.apache.org/jira/browse/SPARK-10466
UnsafeRow ser/de will cause assert error, particularly for sort-based shuffle
with data spill, this is not acceptable as it’s very common in a large table
joins.
From: Reynold Xin
Hi, can you try something like:
val rowRDD=sc.textFile(/user/spark/short_model).map{ line =
val p = line.split(\\tfile:///\\t)
if (p.length =72) {
Row(p(0), p(1)…)
} else {
throw new RuntimeException(s“failed in parsing $line”)
}
}
From the log
Cheng Hao created SPARK-10327:
-
Summary: Cache Table is not working while subquery has alias in
its project list
Key: SPARK-10327
URL: https://issues.apache.org/jira/browse/SPARK-10327
Project: Spark
Cheng Hao created SPARK-10270:
-
Summary: Add/Replace some Java friendly DataFrame API
Key: SPARK-10270
URL: https://issues.apache.org/jira/browse/SPARK-10270
Project: Spark
Issue Type
Ok, I see, thanks for the correction, but this should be optimized.
From: Shixiong Zhu [mailto:zsxw...@gmail.com]
Sent: Tuesday, August 25, 2015 2:08 PM
To: Cheng, Hao
Cc: Jeff Zhang; user@spark.apache.org
Subject: Re: DataFrame#show cost 2 Spark Jobs ?
That's two jobs. `SparkPlan.executeTake
[
https://issues.apache.org/jira/browse/SPARK-10215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710719#comment-14710719
]
Cheng Hao commented on SPARK-10215:
---
Yes, that's a blocker issue for our customer, I
O, Sorry, I miss reading your reply!
I know the minimum tasks will be 2 for scanning, but Jeff is talking about 2
jobs, not 2 tasks.
From: Shixiong Zhu [mailto:zsxw...@gmail.com]
Sent: Tuesday, August 25, 2015 1:29 PM
To: Cheng, Hao
Cc: Jeff Zhang; user@spark.apache.org
Subject: Re: DataFrame
Did you register temp table via the beeline or in a new Spark SQL CLI?
As I know, the temp table cannot cross the HiveContext.
Hao
From: Udit Mehta [mailto:ume...@groupon.com]
Sent: Wednesday, August 26, 2015 8:19 AM
To: user
Subject: Spark thrift server on yarn
Hi,
I am trying to start a
The first job is to infer the json schema, and the second one is what you mean
of the query.
You can provide the schema while loading the json file, like below:
sqlContext.read.schema(xxx).json(“…”)?
Hao
From: Jeff Zhang [mailto:zjf...@gmail.com]
Sent: Monday, August 24, 2015 6:20 PM
To:
And be sure the hive-site.xml is under the classpath or under the path of
$SPARK_HOME/conf
Hao
From: Ishwardeep Singh [mailto:ishwardeep.si...@impetus.co.in]
Sent: Monday, August 24, 2015 8:57 PM
To: user
Subject: Re: Loading already existing tables in spark shell
Hi Jeetendra,
I faced
loading the data for JSON, it’s probably causes longer time for ramp up with
large number of files/partitions.
From: Jeff Zhang [mailto:zjf...@gmail.com]
Sent: Tuesday, August 25, 2015 8:11 AM
To: Cheng, Hao
Cc: user@spark.apache.org
Subject: Re: DataFrame#show cost 2 Spark Jobs ?
Hi Cheng,
I
Cheng Hao created SPARK-10215:
-
Summary: Div of Decimal returns null
Key: SPARK-10215
URL: https://issues.apache.org/jira/browse/SPARK-10215
Project: Spark
Issue Type: Bug
Components
Yes, check the source code under:
https://github.com/apache/spark/tree/master/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst
From: Todd [mailto:bit1...@163.com]
Sent: Tuesday, August 25, 2015 1:01 PM
To: user@spark.apache.org
Subject: Test case for the spark sql catalyst
Hi, Are
[
https://issues.apache.org/jira/browse/SPARK-10134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cheng Hao updated SPARK-10134:
--
Priority: Minor (was: Major)
Improve the performance of Binary Comparison
[
https://issues.apache.org/jira/browse/SPARK-10134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cheng Hao updated SPARK-10134:
--
Fix Version/s: (was: 1.6.0)
Improve the performance of Binary Comparison
[
https://issues.apache.org/jira/browse/SPARK-10134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708766#comment-14708766
]
Cheng Hao commented on SPARK-10134:
---
We can improve that by enable the comparison every
[
https://issues.apache.org/jira/browse/SPARK-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14704513#comment-14704513
]
Cheng Hao commented on SPARK-10130:
---
Can you change the fix version to 1.5? Lots
Cheng Hao created SPARK-10134:
-
Summary: Improve the performance of Binary Comparison
Key: SPARK-10134
URL: https://issues.apache.org/jira/browse/SPARK-10134
Project: Spark
Issue Type
[
https://issues.apache.org/jira/browse/SPARK-9357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14704311#comment-14704311
]
Cheng Hao commented on SPARK-9357:
--
JoinedRow does increase the overhead by adding layer
[
https://issues.apache.org/jira/browse/SPARK-9357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14704311#comment-14704311
]
Cheng Hao edited comment on SPARK-9357 at 8/20/15 5:28 AM
[
https://issues.apache.org/jira/browse/SPARK-9357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14704311#comment-14704311
]
Cheng Hao edited comment on SPARK-9357 at 8/20/15 5:29 AM
Yes, you can try set the spark.sql.sources.partitionDiscovery.enabled to false.
BTW, which version are you using?
Hao
From: Jerrick Hoang [mailto:jerrickho...@gmail.com]
Sent: Thursday, August 20, 2015 12:16 PM
To: Philip Weaver
Cc: user
Subject: Re: Spark Sql behaves strangely with tables with
20, 2015 1:46 PM
To: Cheng, Hao
Cc: Philip Weaver; user
Subject: Re: Spark Sql behaves strangely with tables with a lot of partitions
I cloned from TOT after 1.5.0 cut off. I noticed there were a couple of CLs
trying to speed up spark sql with tables with a huge number of partitions, I've
made
[
https://issues.apache.org/jira/browse/SPARK-9357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702603#comment-14702603
]
Cheng Hao commented on SPARK-9357:
--
JoinedRow is probably in high efficiency for case
[
https://issues.apache.org/jira/browse/SPARK-7218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702584#comment-14702584
]
Cheng Hao commented on SPARK-7218:
--
Can you give some BKM for this task?
Create a real
Cheng Hao created SPARK-10044:
-
Summary: AnalysisException in resolving reference for sorting with
aggregation
Key: SPARK-10044
URL: https://issues.apache.org/jira/browse/SPARK-10044
Project: Spark
I found the https://spark-prs.appspot.com/ is super slow while open it in a new
window recently, not sure just myself or everybody experience the same, is
there anyways to speed up?
From: Josh Rosen [mailto:rosenvi...@gmail.com]
Sent: Friday, August 14, 2015 10:21 AM
To: dev
Subject: Re:
OK, thanks, probably just myself…
From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: Friday, August 14, 2015 11:04 AM
To: Cheng, Hao
Cc: Josh Rosen; dev
Subject: Re: Automatically deleting pull request comments left by AmplabJenkins
I tried accessing just now.
It took several seconds before
[
https://issues.apache.org/jira/browse/SPARK-8240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696471#comment-14696471
]
Cheng Hao commented on SPARK-8240:
--
It's probably very difficult to define the function
[
https://issues.apache.org/jira/browse/SPARK-8240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696469#comment-14696469
]
Cheng Hao commented on SPARK-8240:
--
It works for me like:
{code}
sql(select concat
[
https://issues.apache.org/jira/browse/SPARK-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696338#comment-14696338
]
Cheng Hao commented on SPARK-9879:
--
I create a new physical operator called LargeLimit
Cheng Hao created SPARK-9879:
Summary: OOM in CTAS with LIMIT
Key: SPARK-9879
URL: https://issues.apache.org/jira/browse/SPARK-9879
Project: Spark
Issue Type: Bug
Components: SQL
[
https://issues.apache.org/jira/browse/SPARK-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cheng Hao updated SPARK-9879:
-
Summary: OOM in LIMIT clause with large number (was: OOM in CTAS with
LIMIT)
OOM in LIMIT clause
[
https://issues.apache.org/jira/browse/SPARK-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cheng Hao updated SPARK-9879:
-
Description:
{code}
create table spark.tablsetest as select * from dpa_ord_bill_tf order by
member_id
[
https://issues.apache.org/jira/browse/SPARK-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cheng Hao updated SPARK-9879:
-
Description:
{code}
create table spark.tablsetest as select * from dpa_ord_bill_tf order by
member_id
Refreshing table only works for the Spark SQL DataSource in my understanding,
apparently here, you’re running a Hive Table.
Can you try to create a table like:
|CREATE TEMPORARY TABLE parquetTable (a int, b string)
|USING org.apache.spark.sql.parquet.DefaultSource
That's a good question, we don't support reading small files in a single
partition yet, but it's definitely an issue we need to optimize, do you mind to
create a jira issue for this? Hopefully we can merge that in 1.6 release.
200 is the default partition number for parallel tasks after the
Firstly, spark.sql.autoBroadcastJoinThreshold only works for the EQUAL JOIN.
Currently, for the non-equal join, if the join type is the INNER join, then it
will be done by CartesianProduct join and BroadcastNestedLoopJoin works for the
outer joins.
In the BroadcastnestedLoopJoin, the table
Definitely worth to try. And you can sort the record before writing out, and
then you will get the parquet files without overlapping keys.
Let us know if that helps.
Hao
From: Philip Weaver [mailto:philip.wea...@gmail.com]
Sent: Wednesday, August 12, 2015 4:05 AM
To: Cheng Lian
Cc: user
Cheng Hao created SPARK-9735:
Summary: Auto infer partition schema of HadoopFsRelation should
should respected the user specified one
Key: SPARK-9735
URL: https://issues.apache.org/jira/browse/SPARK-9735
[
https://issues.apache.org/jira/browse/SPARK-9689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14661359#comment-14661359
]
Cheng Hao commented on SPARK-9689:
--
After investigation, the root cause for the failure
[
https://issues.apache.org/jira/browse/SPARK-9689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cheng Hao updated SPARK-9689:
-
Description:
{code:title=example|borderStyle=solid}
// create a HadoopFsRelation based table
sql
Cheng Hao created SPARK-9689:
Summary: Cache doesn't refresh for HadoopFsRelation based table
Key: SPARK-9689
URL: https://issues.apache.org/jira/browse/SPARK-9689
Project: Spark
Issue Type: Bug
[
https://issues.apache.org/jira/browse/SPARK-7119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652892#comment-14652892
]
Cheng Hao commented on SPARK-7119:
--
[~marmbrus] This is actually a bug fixing
Cheng Hao created SPARK-9381:
Summary: Migrate JSON data source to the new partitioning data
source
Key: SPARK-9381
URL: https://issues.apache.org/jira/browse/SPARK-9381
Project: Spark
Issue
[
https://issues.apache.org/jira/browse/SPARK-9374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642706#comment-14642706
]
Cheng Hao commented on SPARK-9374:
--
[~cloud_fan]] Can you also take look at this failure
[
https://issues.apache.org/jira/browse/SPARK-9239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14638276#comment-14638276
]
Cheng Hao commented on SPARK-9239:
--
[~yhuai] are you working on this now? Or I can take
[
https://issues.apache.org/jira/browse/SPARK-8230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14629107#comment-14629107
]
Cheng Hao commented on SPARK-8230:
--
[~pedrorodriguez], actually [~TarekAuel] set a good
Have you ever try query the “select * from temp_table” from the spark shell? Or
can you try the option --jars while starting the spark shell?
From: Srikanth [mailto:srikanth...@gmail.com]
Sent: Thursday, July 16, 2015 9:36 AM
To: user
Subject: Re: HiveThriftServer2.startWithContext error with
Actually it's supposed to be part of Spark 1.5 release, see
https://issues.apache.org/jira/browse/SPARK-8230
You're definitely welcome to contribute to it, let me know if you have any
question on implementing it.
Cheng Hao
-Original Message-
From: pedro [mailto:ski.rodrig...@gmail.com
Can you describe how did you cache the tables? In another HiveContext? AFAIK,
cached table only be visible within the same HiveContext, you probably need to
execute the sql query like
“cache table mytable as SELECT xxx” in the JDBC connection also.
Cheng Hao
From: Brandon White
So you’re with different HiveContext instances for the caching. We are not
expected to see the cached tables cached with the other HiveContext instance.
From: Brandon White [mailto:bwwintheho...@gmail.com]
Sent: Wednesday, July 15, 2015 8:48 AM
To: Cheng, Hao
Cc: user
Subject: Re: How do you
[
https://issues.apache.org/jira/browse/SPARK-8956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624121#comment-14624121
]
Cheng Hao commented on SPARK-8956:
--
Sorry, I didn't notice this jira issue when I created
[
https://issues.apache.org/jira/browse/SPARK-8972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cheng Hao updated SPARK-8972:
-
Description:
{code:java}
import sqlContext.implicits._
case class KeyValue(key: Int, value: String)
val
Cheng Hao created SPARK-8972:
Summary: Wrong result for rollup
Key: SPARK-8972
URL: https://issues.apache.org/jira/browse/SPARK-8972
Project: Spark
Issue Type: Bug
Components: SQL
[
https://issues.apache.org/jira/browse/SPARK-8972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cheng Hao updated SPARK-8972:
-
Summary: Incorrect result for rollup (was: Wrong result for rollup)
Incorrect result for rollup
Never mind, I’ve created the jira issue at
https://issues.apache.org/jira/browse/SPARK-8972.
From: Cheng, Hao [mailto:hao.ch...@intel.com]
Sent: Friday, July 10, 2015 9:15 AM
To: yana.kadiy...@gmail.com; ayan guha
Cc: user
Subject: RE: [SparkSQL] Incorrect ROLLUP results
Yes, this is a bug, do
Yes, this is a bug, do you mind to create a jira issue for this? I will fix
this asap.
BTW, what’s your spark version?
From: Yana Kadiyska [mailto:yana.kadiy...@gmail.com]
Sent: Friday, July 10, 2015 12:16 AM
To: ayan guha
Cc: user
Subject: Re: [SparkSQL] Incorrect ROLLUP results
[
https://issues.apache.org/jira/browse/SPARK-8864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cheng Hao updated SPARK-8864:
-
Comment: was deleted
(was: Thanks for explanation. The design looks good to me now.)
Date/time function
[
https://issues.apache.org/jira/browse/SPARK-8864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14618200#comment-14618200
]
Cheng Hao commented on SPARK-8864:
--
Thanks for explanation. The design looks good to me
[
https://issues.apache.org/jira/browse/SPARK-8864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14618201#comment-14618201
]
Cheng Hao commented on SPARK-8864:
--
Thanks for explanation. The design looks good to me
[
https://issues.apache.org/jira/browse/SPARK-7119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cheng Hao updated SPARK-7119:
-
Priority: Blocker (was: Major)
ScriptTransform doesn't consider the output data type
Cheng Hao created SPARK-8867:
Summary: Show the UDF usage for user.
Key: SPARK-8867
URL: https://issues.apache.org/jira/browse/SPARK-8867
Project: Spark
Issue Type: Task
Components
Cheng Hao created SPARK-8883:
Summary: Remove the class OverrideFunctionRegistry
Key: SPARK-8883
URL: https://issues.apache.org/jira/browse/SPARK-8883
Project: Spark
Issue Type: Improvement
dataframe.limit(1).selectExpr(xxx).collect()?
-Original Message-
From: chrish2312 [mailto:c...@palantir.com]
Sent: Wednesday, July 8, 2015 6:20 AM
To: user@spark.apache.org
Subject: Hive UDFs
I know the typical way to apply a hive UDF to a dataframe is basically
something like:
[
https://issues.apache.org/jira/browse/SPARK-8864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617846#comment-14617846
]
Cheng Hao commented on SPARK-8864:
--
Long = 2 ^ 63 = 9.2E18, the timestamp is in us
Caused by: java.lang.NoClassDefFoundError: Could not initialize class
org.apache.derby.jdbc.EmbeddedDriver
It will be included in the assembly jar usually, not sure what's wrong. But can
you try add the derby jar into the driver classpath and try again?
-Original Message-
From: bdev
Cheng Hao created SPARK-8791:
Summary: Make a better hashcode for InternalRow
Key: SPARK-8791
URL: https://issues.apache.org/jira/browse/SPARK-8791
Project: Spark
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/SPARK-8159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612728#comment-14612728
]
Cheng Hao commented on SPARK-8159:
--
Will that possible to add all of the expressions
[
https://issues.apache.org/jira/browse/SPARK-8653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609627#comment-14609627
]
Cheng Hao commented on SPARK-8653:
--
Yes, I agree that we cannot make a clear cut
[
https://issues.apache.org/jira/browse/SPARK-8653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609609#comment-14609609
]
Cheng Hao commented on SPARK-8653:
--
For most of the Mathematical expressions, we can get
[
https://issues.apache.org/jira/browse/SPARK-8653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609629#comment-14609629
]
Cheng Hao commented on SPARK-8653:
--
What do you think [~rxin]?
Add constraint
[
https://issues.apache.org/jira/browse/SPARK-8653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14607195#comment-14607195
]
Cheng Hao commented on SPARK-8653:
--
[~rxin] I'll agree that we need to rename the trait
Cheng Hao created SPARK-8653:
Summary: Add constraint for Children expression for data type
Key: SPARK-8653
URL: https://issues.apache.org/jira/browse/SPARK-8653
Project: Spark
Issue Type: Sub
Yes, with should be with HiveContext, not SQLContext.
From: ayan guha [mailto:guha.a...@gmail.com]
Sent: Tuesday, June 23, 2015 2:51 AM
To: smazumder
Cc: user
Subject: Re: Support for Windowing and Analytics functions in Spark SQL
1.4 supports it
On 23 Jun 2015 02:59, Sourav Mazumder
It’s actually not that tricky.
SPARK_WORKER_CORES: is the max task thread pool size of the of the executor,
the same saying of “one executor with 32 cores and the executor could execute
32 tasks simultaneously”. Spark doesn’t care about how much real physical
CPU/Cores you have (OS does), so
Yes, it is thread safe. That’s how Spark SQL JDBC Server works.
Cheng Hao
From: V Dineshkumar [mailto:developer.dines...@gmail.com]
Sent: Wednesday, June 17, 2015 9:44 PM
To: user@spark.apache.org
Subject: Is HiveContext Thread Safe?
Hi,
I have a HiveContext which I am using in multiple
Seems you're hitting the self-join, currently Spark SQL won't cache any
result/logical tree for further analyzing or computing for self-join. Since the
logical tree is huge, it's reasonable to take long time in generating its tree
string recursively. And I also doubt the computing can finish
Not sure if Spark RDD will provide API to fetch the record one by one from the
final result set, instead of the pulling them all / (or whole partition data)
and fit in the driver memory.
Seems a big change.
From: Cheng Lian [mailto:l...@databricks.com]
Sent: Friday, June 12, 2015 3:51 PM
To:
Not sure if Spark Core will provide API to fetch the record one by one from the
block manager, instead of the pulling them all into the driver memory.
From: Cheng Lian [mailto:l...@databricks.com]
Sent: Friday, June 12, 2015 3:51 PM
To: 姜超才; Hester wang; user@spark.apache.org
Subject: Re: 回复:
[
https://issues.apache.org/jira/browse/SPARK-7550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581315#comment-14581315
]
Cheng Hao commented on SPARK-7550:
--
I will start working on this today, sorry
[
https://issues.apache.org/jira/browse/SPARK-8159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578503#comment-14578503
]
Cheng Hao commented on SPARK-8159:
--
One more question, is that possible to assign
[
https://issues.apache.org/jira/browse/SPARK-8267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578559#comment-14578559
]
Cheng Hao commented on SPARK-8267:
--
I am working on this.
string function: trim
[
https://issues.apache.org/jira/browse/SPARK-8159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578502#comment-14578502
]
Cheng Hao commented on SPARK-8159:
--
Agree, it would be easier to track the progress
Is it the large result set return from the Thrift Server? And can you paste the
SQL and physical plan?
From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: Tuesday, June 9, 2015 12:01 PM
To: Sourav Mazumder
Cc: user
Subject: Re: Spark SQL with Thrift Server is very very slow and finally failing
[
https://issues.apache.org/jira/browse/SPARK-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579127#comment-14579127
]
Cheng Hao commented on SPARK-8248:
--
I am working on this.
string function: length
[
https://issues.apache.org/jira/browse/SPARK-8228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579903#comment-14579903
]
Cheng Hao commented on SPARK-8228:
--
I'll take this one.
conditional function: isnull
[
https://issues.apache.org/jira/browse/SPARK-8242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579914#comment-14579914
]
Cheng Hao commented on SPARK-8242:
--
I'll take this one.
string function: decode
[
https://issues.apache.org/jira/browse/SPARK-8244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579916#comment-14579916
]
Cheng Hao commented on SPARK-8244:
--
I'll take this one.
string function: find_in_set
[
https://issues.apache.org/jira/browse/SPARK-8246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579918#comment-14579918
]
Cheng Hao commented on SPARK-8246:
--
I'll take this one.
string function
[
https://issues.apache.org/jira/browse/SPARK-8251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579922#comment-14579922
]
Cheng Hao commented on SPARK-8251:
--
I'll take this one.
string function: alias upper
[
https://issues.apache.org/jira/browse/SPARK-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579931#comment-14579931
]
Cheng Hao commented on SPARK-8259:
--
I'll take this one.
string function: rpad
[
https://issues.apache.org/jira/browse/SPARK-8261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579933#comment-14579933
]
Cheng Hao commented on SPARK-8261:
--
I'll take this one.
string function: space
101 - 200 of 611 matches
Mail list logo