Hi,
I was trying to find out why this unit test can pass in Spark code.
inhttps://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
for this unit test:
test("Star Expansion - CreateStruct and CreateArray") {
val structDf =
Hi,
I found out one problem of using "spark.driver.userClassPathFirst" and
SPARK_CLASSPATH in spark-env.sh on Standalone environment, and want to confirm
this in fact has no good solution.
We are running Spark 1.5.2 in standalone mode on a cluster. Since the cluster
doesn't have the direct
Hi,
I found out one problem of using "spark.driver.userClassPathFirst" and
SPARK_CLASSPATH in spark-env.sh on Standalone environment, and want to confirm
this in fact has no good solution.
We are running Spark 1.5.2 in standalone mode on a cluster. Since the cluster
doesn't have the direct
It is described in "Hadoop Definition Guild", chapter 3, FilePattern
https://www.safaribooksonline.com/library/view/hadoop-the-definitive/9781449328917/ch03.html#FilePatterns
Yong
From: pradeep1...@gmail.com
Date: Wed, 13 Apr 2016 18:56:58 +
Subject: how does sc.textFile translate regex in
Is the env/dev/log4j-executor.properties file within your jar file? Is the path
matching with what you specified as env/dev/log4j-executor.properties?
If you read the log4j document here:
https://logging.apache.org/log4j/1.2/manual.html
When you specify the
pr 6, 2016 at 4:05 PM, Yong Zhang <java8...@hotmail.com> wrote:
If they do that, they must provide a customized input format, instead of
through JDBC.
Yong
Date: Wed, 6 Apr 2016 23:56:54 +0100
Subject: Re: Sqoop on Spark
From: mich.talebza...@gmail.com
To: mohaj...@gmail.com
CC: jorn
If they do that, they must provide a customized input format, instead of
through JDBC.
Yong
Date: Wed, 6 Apr 2016 23:56:54 +0100
Subject: Re: Sqoop on Spark
From: mich.talebza...@gmail.com
To: mohaj...@gmail.com
CC: jornfra...@gmail.com; msegel_had...@hotmail.com; guha.a...@gmail.com;
1) Is a struct in Spark like a struct in C++?
Kinda. Its an ordered collection of data with known names/types. 2)
What is an alias in this context?
it is assigning a name to the column. similar to doing AS in sql. 3)
How does this code even work?
Ordering
that each
partition from 1 DF join with partition with same key of DF 2 on the worker
node without shuffling the data.In other words do as much as work within worker
node before shuffling the data.
ThanksDarshan Singh
On Wed, Apr 6, 2016 at 10:06 PM, Yong Zhang <java8...@hotmail.com>
Thanks
On Wed, Apr 6, 2016 at 8:37 PM, Yong Zhang <java8...@hotmail.com> wrote:
What you are looking for is https://issues.apache.org/jira/browse/SPARK-4849
This feature is available in Spark 1.6.0, so the DataFrame can reuse the
partitioned data in the join.
For you case in 1.5.x, you have
me know if you need further information.
On Tue, Apr 5, 2016 at 6:33 PM, Yong Zhang <java8...@hotmail.com> wrote:
You need to show us the execution plan, so we can understand what is your issue.
Use the spark shell code to show how your DF is built, how you partition them,
then use expl
Hi, Michael:
I would like to ask the same question, if the DF hash partitioned, then cache,
now query/filter by the column which hashed for partition, will Spark be smart
enough to do the Partition pruning in this case, instead of depending on
Parquet's partition pruning. I think that is the
You need to show us the execution plan, so we can understand what is your issue.
Use the spark shell code to show how your DF is built, how you partition them,
then use explain(true) on your join DF, and show the output here, so we can
better help you.
Yong
> Date: Tue, 5 Apr 2016 09:46:59
In the standalone mode, it applies to the Driver JVM processor heap size.
You should consider giving enough memory space to it, in standalone mode, due
to:
1) Any data you bring back to the driver will store in it, like RDD.collect or
DF.show2) The Driver also host a web UI for the application
Is there a configuration in the Spark of location of "shuffle spilling"? I
didn't recall ever see that one. Can you share it out?
It will be good for a test writing to RAM Disk if that configuration is
available.
Thanks
Yong
From: r...@databricks.com
Date: Fri, 1 Apr 2016 15:32:23 -0700
I agree that there won't be a generic solution for these kind of cases.
Without the CBO from Spark or Hadoop ecosystem in short future, maybe Spark
DataFrame/SQL should support more hints from the end user, as in these cases,
end users will be smart enough to tell the engine what is the correct
Hi, Sparkers
Our cluster is running Spark 1.5.2 with Standalone mode.
It runs fine for weeks, but today, I found out the master crash due to OOM.
We have several ETL jobs runs daily on Spark, and adhoc jobs. I can see the
"Completed Applications" table grows in the master UI.
Original I set
.com
> CC: user@spark.apache.org
>
> On Wed, Mar 23, 2016 at 10:35 AM, Yong Zhang <java8...@hotmail.com> wrote:
> > Here is the output:
> >
> > == Parsed Logical Plan ==
> > Project [400+ columns]
> > +- Project [400+ columns]
> >+- Project [400+ columns]
&g
t; From: dav...@databricks.com
> To: java8...@hotmail.com
> CC: user@spark.apache.org
>
> On Wed, Mar 23, 2016 at 10:35 AM, Yong Zhang <java8...@hotmail.com> wrote:
> > Here is the output:
> >
> > == Parsed Logical Plan ==
> > Project [400+ columns]
> &g
;
> The broadcast hint does not work as expected in this case, could you
> also how the logical plan by 'explain(true)'?
>
> On Wed, Mar 23, 2016 at 8:39 AM, Yong Zhang <java8...@hotmail.com> wrote:
> >
> > So I am testing this code to understand "broadcast&qu
When I tried to compile the Spark 1.5.2 with -Phive-0.12.0, maven gave me back
an error that profile doesn't exist any more.
But when I read the Spark SQL programming guide here:
http://spark.apache.org/docs/1.5.2/sql-programming-guide.htmlIt keeps
mentioning Spark 1.5.2 still can work with
Hi,
I am testing the Spark 1.5.2 using a 4 nodes cluster, with 64G memory each, and
one is master and 3 are workers. I am using Standalone mode.
Here is my spark-env.sh for settings:
export
SPARK_LOCAL_DIRS=/data1/spark/local,/data2/spark/local,/data3/spark/localexport
101 - 122 of 122 matches
Mail list logo