RE: SequenceFile and object reuse

2015-11-19 Thread andrew.rowson
As I understand it, it's down to how Hadoop FileInputFormats work, and questions of mutability. If you were to read a file from Hadoop via an InputFormat with a simple Java program, the InputFormat's RecordReader creates a single, mutable instance of the Writable key class and a single, mutable

Re: new datasource

2015-11-19 Thread Michael Armbrust
Yeah, CatalystScan should give you everything we can possibly push down in raw form. Note that this is not compatible across different spark versions. On Thu, Nov 19, 2015 at 8:55 AM, james.gre...@baesystems.com < james.gre...@baesystems.com> wrote: > Thanks Hao > > > > I have written a new

Re: Dropping support for earlier Hadoop versions in Spark 2.0?

2015-11-19 Thread Ted Yu
Should a new job be setup under Spark-Master-Maven-with-YARN for hadoop 2.6.x ? Cheers On Thu, Nov 19, 2015 at 5:16 PM, 张志强(旺轩) wrote: > I agreed > +1 > > -- > 发件人:Reynold Xin > 日

回复:Dropping support for earlier Hadoop versions in Spark 2.0?

2015-11-19 Thread 张志强(旺轩)
I agreed +1--发件人:Reynold Xin日 期:2015年11月20日 06:14:44收件人:dev@spark.apache.org; Sean Owen; Thomas Graves主 题:Dropping support for earlier Hadoop

Re: Removing the Mesos fine-grained mode

2015-11-19 Thread Jo Voordeckers
As a recent fine-grained mode adopter I'm now confused after reading this and other resources from spark-summit, the docs, ... so can someone please advise me for our use-case? We'll have 1 or 2 streaming jobs and an will run scheduled batch jobs which should take resources away from the

Re: spark-submit is throwing NPE when trying to submit a random forest model

2015-11-19 Thread Joseph Bradley
Hi, Could you please submit this via JIRA as a bug report? It will be very helpful if you include the Spark version, system details, and other info too. Thanks! Joseph On Thu, Nov 19, 2015 at 1:21 PM, Rachana Srivastava < rachana.srivast...@markmonitor.com> wrote: > *Issue:* > > I have a random

spark-submit is throwing NPE when trying to submit a random forest model

2015-11-19 Thread Rachana Srivastava
Issue: I have a random forest model that am trying to load during streaming using following code. The code is working fine when I am running the code from Eclipse but getting NPE when running the code using spark-submit. JavaStreamingContext jssc = new JavaStreamingContext(jsc,

Dropping support for earlier Hadoop versions in Spark 2.0?

2015-11-19 Thread Reynold Xin
I proposed dropping support for Hadoop 1.x in the Spark 2.0 email, and I think everybody is for that. https://issues.apache.org/jira/browse/SPARK-11807 Sean suggested also dropping support for Hadoop 2.2, 2.3, and 2.4. That is to say, keep only Hadoop 2.6 and greater. What are the community's

Removing the Mesos fine-grained mode

2015-11-19 Thread Iulian Dragoș
Hi all, Mesos is the only cluster manager that has a fine-grained mode, but it's more often than not problematic, and it's a maintenance burden. I'd like to suggest removing it in the 2.0 release. A few reasons: - code/maintenance complexity. The two modes duplicate a lot of functionality (and

new datasource

2015-11-19 Thread james.gre...@baesystems.com
We have written a new Spark DataSource that uses both Parquet and ElasticSearch. It is based on the existing Parquet DataSource. When I look at the filters being pushed down to buildScan I don’t get anything representing any filters based on UDFs – or for any fields generated by an explode

Re: Dropping support for earlier Hadoop versions in Spark 2.0?

2015-11-19 Thread Henri Dubois-Ferriere
+1 On 19 November 2015 at 14:14, Reynold Xin wrote: > I proposed dropping support for Hadoop 1.x in the Spark 2.0 email, and I > think everybody is for that. > > https://issues.apache.org/jira/browse/SPARK-11807 > > Sean suggested also dropping support for Hadoop 2.2, 2.3,

Re: Dropping support for earlier Hadoop versions in Spark 2.0?

2015-11-19 Thread Jean-Baptiste Onofré
+1 Regards JB On 11/19/2015 11:14 PM, Reynold Xin wrote: I proposed dropping support for Hadoop 1.x in the Spark 2.0 email, and I think everybody is for that. https://issues.apache.org/jira/browse/SPARK-11807 Sean suggested also dropping support for Hadoop 2.2, 2.3, and 2.4. That is to say,

RE: new datasource

2015-11-19 Thread Cheng, Hao
I think you probably need to write some code as you need to support the ES, there are 2 options per my understanding: Create a new Data Source from scratch, but you probably need to overwrite the interface at:

Re: Removing the Mesos fine-grained mode

2015-11-19 Thread Dean Wampler
Sounds like the right move. Simplifies things in important ways. Dean Wampler, Ph.D. Author: Programming Scala, 2nd Edition (O'Reilly) Typesafe @deanwampler http://polyglotprogramming.com On

RE: new datasource

2015-11-19 Thread james.gre...@baesystems.com
Thanks Hao I have written a new Data Source based on ParquetRelation and I have just retested what I had said about not getting anything extra when I change it over to a CatalystScan instead of PrunedFilteredScan and ooops it seems to work fine. From: Cheng, Hao

Re: Removing the Mesos fine-grained mode

2015-11-19 Thread Heller, Chris
I was one that argued for fine-grain mode, and there is something I still appreciate about how fine-grain mode operates in terms of the way one would define a Mesos framework. That said, with dyn-allocation and Mesos support for both resource reservation, oversubscription and revocation, I