Hi Jerry,
Do you have speculation enabled? A write which produces one million files /
output partitions might be using tons of driver memory via the
OutputCommitCoordinator's bookkeeping data structures.
On Sun, Oct 25, 2015 at 5:50 PM, Jerry Lam wrote:
> Hi spark guys,
>
Column 4 is always constant, so no predictive power resulting zero weight.
On Sunday, October 25, 2015, Zhiliang Zhu wrote:
> Hi DB Tsai,
>
> Thanks very much for your kind reply help.
>
> As for your comment, I just modified and tested the key part of the codes:
>
>
Hi Josh,
No I don't have speculation enabled. The driver took about few hours until
it was OOM. Interestingly, all partitions are generated successfully
(_SUCCESS file is written in the output directory). Is there a reason why
the driver needs so much memory? The jstack revealed that it called
Hi guys,
I mentioned that the partitions are generated so I tried to read the
partition data from it. The driver is OOM after few minutes. The stack
trace is below. It looks very similar to the the jstack above (note on the
refresh method). Thanks!
Name: java.lang.OutOfMemoryError
Message: GC
Hi spark guys,
I think I hit the same issue SPARK-8890
https://issues.apache.org/jira/browse/SPARK-8890. It is marked as resolved.
However it is not. I have over a million output directories for 1 single
column in partitionBy. Not sure if this is a regression issue? Do I need to
set some
So yes the individual artifacts are released however, there is no
deployable bundle prebuilt for Spark 1.5.1 and Scala 2.11.7, something
like: spark-1.5.1-bin-hadoop-2.6_scala-2.11.tgz. The spark site even
states this:
*Note: Scala 2.11 users should download the Spark source package and
build
Hi guys,
After waiting for a day, it actually causes OOM on the spark driver. I
configure the driver to have 6GB. Note that I didn't call refresh myself.
The method was called when saving the dataframe in parquet format. Also I'm
using partitionBy() on the DataFrameWriter to generate over 1
Have the issue resolved. In this case the hostname of my machine is configured
to a public domain resolved to the EC2 machine's public IP. It's not allowed to
bind to an elastic IP. I changed the hostnames to Amazon's private hostname
(ip-72-xxx-xxx) then it works.
Felix,
Missed your reply - agree looks like the same issue, resolved mine as
Duplicate.
Thanks!
Ram
On Sun, Oct 25, 2015 at 2:47 PM, Felix Cheung
wrote:
>
>
> This might be related to https://issues.apache.org/jira/browse/SPARK-10500
>
>
>
> On Sun, Oct 25, 2015 at
Hi,
Does the use of custom partitioner in Streaming affect performance?
On Mon, Oct 5, 2015 at 1:06 PM, Adrian Tanase wrote:
> Great article, especially the use of a custom partitioner.
>
> Also, sorting by multiple fields by creating a tuple out of them is an
> awesome,
As documented in
http://spark.apache.org/docs/latest/configuration.html#available-properties,
Note for “spark.driver.memory”:
Note: In client mode, this config must not be set through the SparkConf
directly in your application, because the driver JVM has already started at
that point. Instead,
On Monday, October 26, 2015 11:26 AM, Zhiliang Zhu
wrote:
Hi DB Tsai,
Thanks very much for your kind help. I get it now.
I am sorry that there is another issue, the weight/coefficient result is
perfect while A is triangular matrix, however, while A
please add "setFitIntercept(false)" to your LinearRegression.
LinearRegression by default includes an intercept in the model, e.g.
label = intercept + features dot weight
To get the result you want, you need to force the intercept to be zero.
Just curious, are you trying to solve systems of
This might be related to https://issues.apache.org/jira/browse/SPARK-10500
On Sun, Oct 25, 2015 at 9:57 AM -0700, "Ted Yu" wrote:
In zipRLibraries():
// create a zip file from scratch, do not append to existing file.
val zipFile = new File(dir, name)
I guess
Ted Yu,
Agree that either picking up sparkr.zip if it already exists, or creating a
zip in a local scratch directory will work. This code is called by the
client side job submission logic and the resulting zip is already added to
the local resources for the YARN job, so I don't think the
Hi DB Tsai,
Thanks very much for your kind reply help.
As for your comment, I just modified and tested the key part of the codes:
LinearRegression lr = new LinearRegression()
.setMaxIter(1)
.setRegParam(0)
.setElasticNetParam(0); //the number could be reset
final
In zipRLibraries():
// create a zip file from scratch, do not append to existing file.
val zipFile = new File(dir, name)
I guess instead of creating sparkr.zip in the same directory as R lib, the
zip file can be created under some directory writable by the user launching
the app and
A dependency couldn't be downloaded:
[INFO] +- com.h2database:h2:jar:1.4.183:test
Have you checked your network settings ?
Cheers
On Sun, Oct 25, 2015 at 10:22 AM, Bilinmek Istemiyor
wrote:
> Thank you for the quick reply. You are God Send. I have long not been
>
Dear All,
I have some program as below which makes me very much confused and inscrutable,
it is about multiple dimension linear regression mode, the weight / coefficient
is always perfect while the dimension is smaller than 4, otherwise it is wrong
all the time.Or, whether the
Thank you for the quick reply. You are God Send. I have long not been
programming in java, nothing know about maven, scala, sbt ant spark stuff.
I used java 7 since build failed with java 8. Which java version do you
advise in general to use spark. I can downgrade scala version as well. Can
you
LinearRegressionWithSGD is not stable. Please use linear regression in
ML package instead.
http://spark.apache.org/docs/latest/ml-linear-methods.html
Sincerely,
DB Tsai
--
Web: https://www.dbtsai.com
PGP Key ID: 0xAF08DF8D
On Sun, Oct 25,
Hi Bilnmek,
Spark 1.5.x does not support Scala 2.11.7 so the easiest thing to do it
build it like your trying. Here are the steps I followed to build it on a
Max OS X 10.10.5 environment, should be very similar on ubuntu.
1. set theJAVA_HOME environment variable in my bash session via export
Hm, why do you say it doesn't support 2.11? It does.
It is not even this difficult; you just need a source distribution,
and then run "./dev/change-scala-version.sh 2.11" as you say. Then
build as normal
On Sun, Oct 25, 2015 at 4:00 PM, Todd Nist wrote:
> Hi Bilnmek,
>
>
I have not been able to start Spark scala shell since 1.5 as it was not able
to create the sqlContext during the startup. It complains the metastore_db
is already locked: "Another instance of Derby may have already booted the
database". The Derby log is attached.
I only have this problem with
Have you taken a look at the fix for SPARK-11000 which is in the upcoming
1.6.0 release ?
Cheers
On Sun, Oct 25, 2015 at 8:42 AM, Yao wrote:
> I have not been able to start Spark scala shell since 1.5 as it was not
> able
> to create the sqlContext during the startup. It
thanks i will read up on that
On Sat, Oct 24, 2015 at 12:53 PM, Ted Yu wrote:
> The code below was introduced by SPARK-7673 / PR #6225
>
> See item #1 in the description of the PR.
>
> Cheers
>
> On Sat, Oct 24, 2015 at 12:59 AM, Koert Kuipers wrote:
>
If you run sparkR in yarn-client mode, it fails with
Exception in thread "main" java.io.FileNotFoundException:
/usr/hdp/2.3.2.1-12/spark/R/lib/sparkr.zip (Permission denied)
at java.io.FileOutputStream.open0(Native Method)
at
Thanks. I wonder why this is not widely reported in the user forum. The RELP
shell is basically broken in 1.5 .0 and 1.5.1
-Yao
From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: Sunday, October 25, 2015 12:01 PM
To: Ge, Yao (Y.)
Cc: user
Subject: Re: Spark scala REPL - Unable to create sqlContext
If you have a pull request, Jenkins can test your change for you.
FYI
> On Oct 25, 2015, at 12:43 PM, Richard Eggert wrote:
>
> Also, if I run the Maven build on Windows or Linux without setting
> -DskipTests=true, it hangs indefinitely when it gets to
>
Yes, I know, but it would be nice to be able to test things myself before I
push commits.
On Sun, Oct 25, 2015 at 3:50 PM, Ted Yu wrote:
> If you have a pull request, Jenkins can test your change for you.
>
> FYI
>
> On Oct 25, 2015, at 12:43 PM, Richard Eggert
Sorry Sean you are absolutely right it supports 2.11 all o meant is there
is no release available as a standard download and that one has to build
it. Thanks for the clairification.
-Todd
On Sunday, October 25, 2015, Sean Owen wrote:
> Hm, why do you say it doesn't support
When I try to start up sbt for the Spark build, or if I try to import it
in IntelliJ IDEA as an sbt project, it fails with a "No such file or
directory" error when it attempts to "git clone" sbt-pom-reader into
.sbt/0.13/staging/some-sha1-hash.
If I manually create the expected directory before
Also, if I run the Maven build on Windows or Linux without setting
-DskipTests=true, it hangs indefinitely when it gets to
org.apache.spark.JavaAPISuite.
It's hard to test patches when the build doesn't work. :-/
On Sun, Oct 25, 2015 at 3:41 PM, Richard Eggert
wrote:
By "it works", I mean, "It gets past that particular error". It still fails
several minutes later with a different error:
java.lang.IllegalStateException: impossible to get artifacts when data has
not been loaded. IvyNode = org.scala-lang#scala-library;2.10.3
On Sun, Oct 25, 2015 at 3:38 PM,
No, 2.11 artifacts are in fact published:
http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22spark-parent_2.11%22
On Sun, Oct 25, 2015 at 7:37 PM, Todd Nist wrote:
> Sorry Sean you are absolutely right it supports 2.11 all o meant is there is
> no release available as a
35 matches
Mail list logo