Hi Sandy,
Any resolution for YARN failures ? It's a blocker for running spark on top
of YARN.
Thanks.
Deb
On Tue, Aug 19, 2014 at 11:29 PM, Xiangrui Meng men...@gmail.com wrote:
Hi Deb,
I think this may be the same issue as described in
https://issues.apache.org/jira/browse/SPARK-2121 . We
Hi All,
Sorry for my late reply!
Yu Ishikawa,Thanks for your interests in Saury project. You are welcomed to
try that out. If you have questions about that, please email me. We are
keeping improving performance/adding features for the project.
Xiangrui, thanks for your encouragement. If you
I've been looking at performance differences between spark sql queries
against single parquet tables, vs a unionAll of two tables. It's a
significant difference, like 5 to 10x
Is there a reason in general not to push projections and predicates down
into the individual ParquetTableScans in a
On Tue, Sep 9, 2014 at 10:17 AM, Cody Koeninger c...@koeninger.org wrote:
Is there a reason in general not to push projections and predicates down
into the individual ParquetTableScans in a union?
This would be a great case to add to ColumnPruning. Would be awesome if
you could open a JIRA
Hi Deb,
The current state of the art is to increase
spark.yarn.executor.memoryOverhead until the job stops failing. We do have
plans to try to automatically scale this based on the amount of memory
requested, but it will still just be a heuristic.
-Sandy
On Tue, Sep 9, 2014 at 7:32 AM,
FWIW consensus from Cloudera folk seems to be that there's no need or
demand on this end for YARN alpha. It wouldn't have an impact if it
were removed sooner even.
It will be a small positive to reduce complexity by removing this
support, making it a little easier to develop for current YARN
Opened
https://issues.apache.org/jira/browse/SPARK-3462
I'll take a look at ColumnPruning and see what I can do
On Tue, Sep 9, 2014 at 12:46 PM, Michael Armbrust mich...@databricks.com
wrote:
On Tue, Sep 9, 2014 at 10:17 AM, Cody Koeninger c...@koeninger.org
wrote:
Is there a reason in
Thanks!
On Tue, Sep 9, 2014 at 11:07 AM, Cody Koeninger c...@koeninger.org wrote:
Opened
https://issues.apache.org/jira/browse/SPARK-3462
I'll take a look at ColumnPruning and see what I can do
On Tue, Sep 9, 2014 at 12:46 PM, Michael Armbrust mich...@databricks.com
wrote:
On Tue, Sep
I'm kind of surprised this was not run into before. Do people not
segregate their data by day/week in the HDFS directory structure?
On Tue, Sep 9, 2014 at 2:08 PM, Michael Armbrust mich...@databricks.com
wrote:
Thanks!
On Tue, Sep 9, 2014 at 11:07 AM, Cody Koeninger c...@koeninger.org
I think usually people add these directories as multiple partitions of the
same table instead of union. This actually allows us to efficiently prune
directories when reading in addition to standard column pruning.
On Tue, Sep 9, 2014 at 11:26 AM, Gary Malouf malouf.g...@gmail.com wrote:
I'm
Hmm...I did try it increase to few gb but did not get a successful run
yet...
Any idea if I am using say 40 executors, each running 16GB, what's the
typical spark.yarn.executor.memoryOverhead for say 100M x 10 M large
matrices with say few billion ratings...
On Tue, Sep 9, 2014 at 10:49 AM,
Maybe I'm missing something, I thought parquet was generally a write-once
format and the sqlContext interface to it seems that way as well.
d1.saveAsParquetFile(/foo/d1)
// another day, another table, with same schema
d2.saveAsParquetFile(/foo/d2)
Will give a directory structure like
I think what Michael means is people often use this to read existing
partitioned Parquet tables that are defined in a Hive metastore rather
than data generated directly from within Spark and then reading it
back as a table. I'd expect the latter case to become more common, but
for now most users
I would expect 2 GB would be enough or more than enough for 16 GB executors
(unless ALS is using a bunch of off-heap memory?). You mentioned earlier
in this thread that the property wasn't showing up in the Environment tab.
Are you sure it's making it in?
-Sandy
On Tue, Sep 9, 2014 at 11:58
What Patrick said is correct. Two other points:
- In the 1.2 release we are hoping to beef up the support for working with
partitioned parquet independent of the metastore.
- You can actually do operations like INSERT INTO for parquet tables to
add data. This creates new parquet files for each
since the power incident last thursday, the github pull request builder
plugin is still not really working 100%. i found an open issue
w/jenkins[1] that could definitely be affecting us, i will be pausing
builds early thursday morning and then restarting jenkins.
i'll send out a reminder
Last time it did not show up on environment tab but I will give it another
shot...Expected behavior is that this env variable will show up right ?
On Tue, Sep 9, 2014 at 12:15 PM, Sandy Ryza sandy.r...@cloudera.com wrote:
I would expect 2 GB would be enough or more than enough for 16 GB
We were using it until recently, we are talking to our customers and see if
we can get off it.
Chester
Alpine Data Labs
On Tue, Sep 9, 2014 at 10:59 AM, Sean Owen so...@cloudera.com wrote:
FWIW consensus from Cloudera folk seems to be that there's no need or
demand on this end for YARN
Ok, so looking at the optimizer code for the first time and trying the
simplest rule that could possibly work,
object UnionPushdown extends Rule[LogicalPlan] {
def apply(plan: LogicalPlan): LogicalPlan = plan transform {
// Push down filter into
union
case f @ Filter(condition, u @
Hi all,
I am calling an object which in turn is calling a method inside a map RDD in
spark. While writing the tests how can I mock that object's call? Currently I
did doNothing().when(class).method() is called but it is giving task not
serializable exception. I tried making the class both spy
Can you be a little bit more specific, maybe give a code snippet?
On Tue, Sep 9, 2014 at 5:14 PM, Sudershan Malpani
sudershan.malp...@gmail.com wrote:
Hi all,
I am calling an object which in turn is calling a method inside a map RDD
in spark. While writing the tests how can I mock that
Class1.java
@Autowired
Private ClassX cx;
Public list method1(JavaPairRDD data){
List list1 = new ArrayList();
List list2 = new ArrayList();
JavaPairRDD computed = data.map(
new FunctionTuple2object, list() {
Public List call(object obj) throws
22 matches
Mail list logo