Spark RC5 - OutOfMemoryError: Requested array size exceeds VM limit

2016-07-25 Thread Ovidiu-Cristian MARCU
Hi, I am running some tpcds queries (data is Parquet stored in hdfs) with spark 2.0 rc5 and for some queries I get this OOM: java.lang.OutOfMemoryError: Requested array size exceeds VM limit at org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder.grow(BufferHolder.java:73)

Re: orc/parquet sql conf

2016-07-25 Thread Ovidiu-Cristian MARCU
e/spark/pull/13775 > <https://github.com/apache/spark/pull/13775> > > Thanks! > > 2016-07-25 19:01 GMT+09:00 Ovidiu-Cristian MARCU > <ovidiu-cristian.ma...@inria.fr <mailto:ovidiu-cristian.ma...@inria.fr>>: > Hi, > > Assuming I have some

orc/parquet sql conf

2016-07-25 Thread Ovidiu-Cristian MARCU
Hi, Assuming I have some data in both ORC/Parquet formats, and some complex workflow that eventually combine results of some queries on these datasets, I would like to get the best execution and looking at the default configs I noticed: 1) Vectorized query execution possible with Parquet

DMTCP and debug a failed stage in spark

2016-06-16 Thread Ovidiu-Cristian MARCU
Hi, I have a TPCDS query that fails in the stage 80 which is a ResultStage (SparkSQL). Ideally I would like to ‘checkpoint’ a previous stage which was executed successfully and replay the failed stage for debug purposes. Anyone managed to do something similar that could point some hints? Maybe

Re: tpcds q1 - java.lang.NegativeArraySizeException

2016-06-14 Thread Ovidiu-Cristian MARCU
.akka.frameSize 128 spark.shuffle.manager sort > On 14 Jun 2016, at 00:12, Sameer Agarwal <sam...@databricks.com> wrote: > > I'm unfortunately not able to reproduce this on master. Does the query always > fail deterministically? > > On Mon, Jun 13,

Re: tpcds q1 - java.lang.NegativeArraySizeException

2016-06-13 Thread Ovidiu-Cristian MARCU
Yes, commit ad102af > On 13 Jun 2016, at 21:25, Reynold Xin <r...@databricks.com> wrote: > > Did you try this on master? > > > On Mon, Jun 13, 2016 at 11:26 AM, Ovidiu-Cristian MARCU > <ovidiu-cristian.ma...@inria.fr <mailto:ovidiu-cristian.ma...@inria.

tpcds q1 - java.lang.NegativeArraySizeException

2016-06-13 Thread Ovidiu-Cristian MARCU
Hi, Running the first query of tpcds on a standalone setup (4 nodes, tpcds2 generated for scale 10 and transformed in parquet under hdfs) it results in one exception [1]. Close to this problem I found this issue https://issues.apache.org/jira/browse/SPARK-12089

Re: Spark 2.0.0-preview artifacts still not available in Maven

2016-06-06 Thread Ovidiu-Cristian MARCU
+1 for moving this discussion to a proactive new (alpha/beta) release of Apache Spark 2.0! > On 06 Jun 2016, at 20:25, Ovidiu Cristian Marcu <oma...@inria.fr> wrote: > > Any chance to start preparing a new alpha/beta release for 2.0 this month or > the preview will

Re: Spark 2.0.0-preview artifacts still not available in Maven

2016-06-05 Thread Ovidiu-Cristian MARCU
Hi all IMHO the preview ‘release’ is good at is is now, so no further changes required. For me the preview was a trigger to what will be the next Spark 2.0, really appreciate the effort guys made to describe it and market it:) I’ll appreciate if the Apache Spark team will start a vote for a new

Re: Running TPCDSQueryBenchmark results in java.lang.OutOfMemoryError

2016-05-24 Thread Ovidiu-Cristian MARCU
Do you need more information? > On 23 May 2016, at 19:16, Ovidiu-Cristian MARCU > <ovidiu-cristian.ma...@inria.fr> wrote: > > Yes, > > git log > commit dafcb05c2ef8e09f45edfb7eabf58116c23975a0 > Author: Sameer Agarwal <sam...@databricks.com <mailto:sam...@d

Re: Running TPCDSQueryBenchmark results in java.lang.OutOfMemoryError

2016-05-23 Thread Ovidiu-Cristian MARCU
On 23 May 2016, at 18:16, Ted Yu <yuzhih...@gmail.com> wrote: > > Can you tell us the commit hash using which the test was run ? > > For #2, if you can give full stack trace, that would be nice. > > Thanks > > On Mon, May 23, 2016 at 8:58 AM, Ovidiu-Cris

Running TPCDSQueryBenchmark results in java.lang.OutOfMemoryError

2016-05-23 Thread Ovidiu-Cristian MARCU
Hi 1) Using latest spark 2.0 I've managed to run TPCDSQueryBenchmark first 9 queries and then it ends in the OutOfMemoryError [1]. What was the configuration used for running this benchmark? Can you explain the meaning of 4 shuffle partitions? Thanks! On my local system I use:

Re: Building spark master failed

2016-05-23 Thread Ovidiu-Cristian MARCU
> On Mon, May 23, 2016 at 2:16 AM, Ovidiu-Cristian MARCU > <ovidiu-cristian.ma...@inria.fr <mailto:ovidiu-cristian.ma...@inria.fr>> > wrote: > Hi > > I have the following issue when trying to build the latest spark source code > on master: > > /spark/com

Building spark master failed

2016-05-23 Thread Ovidiu-Cristian MARCU
Hi I have the following issue when trying to build the latest spark source code on master: /spark/common/network-common/src/main/java/org/apache/spark/network/util/JavaUtils.java:147: error: cannot find symbol [error] if (process != null && process.isAlive()) { [error]

Re: [vote] Apache Spark 2.0.0-preview release (rc1)

2016-05-18 Thread Ovidiu-Cristian MARCU
find that by changing the filter to target version = 2.0.0. Cheers. > > On Wed, May 18, 2016 at 9:00 AM, Ovidiu-Cristian MARCU > <ovidiu-cristian.ma...@inria.fr <mailto:ovidiu-cristian.ma...@inria.fr>> > wrote: > +1 Great, I see the list of resolved issues, do yo

Re: [vote] Apache Spark 2.0.0-preview release (rc1)

2016-05-18 Thread Ovidiu-Cristian MARCU
+1 Great, I see the list of resolved issues, do you have a list of known issue you plan to stay with this release? with build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.7.1 -Phive -Phive-thriftserver -DskipTests clean package mvn -version Apache Maven 3.3.9