date:20151104

RE: Sort Merge Join from the filesystem

2015-11-04 Thread Cheng, Hao

Yes, we probably need more change for the data source API if we need to implement it in a generic way. BTW, I create the JIRA by copy most of words from Alex. ☺ https://issues.apache.org/jira/browse/SPARK-11512 From: Reynold Xin [mailto:r...@databricks.com] Sent: Thursday, November 5, 2015

Re: How to force statistics calculation of Dataframe?

2015-11-04 Thread Reynold Xin

Can you use the broadcast hint? e.g. df1.join(broadcast(df2)) the broadcast function is in org.apache.spark.sql.functions On Wed, Nov 4, 2015 at 10:19 AM, Charmee Patel wrote: > Hi, > > If I have a hive table, analyze table compute statistics will ensure Spark > SQL has

Re: [VOTE] Release Apache Spark 1.5.2 (RC2)

2015-11-04 Thread Egor Pahomov

+1 Things, which our infrastructure use and I checked: Dynamic allocation Spark ODBC server Reading json Writing parquet SQL quires (hive context) Running on CDH 2015-11-04 9:03 GMT-08:00 Sean Owen : > As usual the signatures and licenses and so on look fine. I continue >

Why LibSVMRelation and CsvRelation don't extends HadoopFsRelation ?

2015-11-04 Thread Jeff Zhang

Not sure the reason, it seems LibSVMRelation and CsvRelation can extends HadoopFsRelation and leverage the features from HadoopFsRelation. Any other consideration for that ? -- Best Regards Jeff Zhang

RE: dataframe slow down with tungsten turn on

2015-11-04 Thread Cheng, Hao

BTW, 1 min V.S. 2 Hours, seems quite weird, can you provide more information on the ETL work? From: Cheng, Hao [mailto:hao.ch...@intel.com] Sent: Thursday, November 5, 2015 12:56 PM To: gen tang; dev@spark.apache.org Subject: RE: dataframe slow down with tungsten turn on 1.5 has critical

RE: Why LibSVMRelation and CsvRelation don't extends HadoopFsRelation ?

2015-11-04 Thread Cheng, Hao

Probably 2 reasons: 1. HadoopFsRelation was introduced since 1.4, but seems CsvRelation was created based on 1.3 2. HadoopFsRelation introduces the concept of Partition, which probably not necessary for LibSVMRelation. But I think it will be easy to change as extending from

Re: Why LibSVMRelation and CsvRelation don't extends HadoopFsRelation ?

2015-11-04 Thread Jeff Zhang

Thanks Hao. I have ready made it extends HadoopFsRelation and it works. Will create a jira for that. Besides that, I noticed that in DataSourceStrategy, spark build physical plan based on the trait of the BaseRelation in pattern matching (e.g. CatalystScan, TableScan, HadoopFsRelation). That

RE: Why LibSVMRelation and CsvRelation don't extends HadoopFsRelation ?

2015-11-04 Thread Cheng, Hao

I think you’re right, we do offer the opportunity for developers to make mistakes while implementing the new Data Source. Here we assume that the new relation MUST NOT extends more than one trait of the CatalystScan, TableScan, PrunedScan, PrunedFilteredScan , etc. otherwise it will causes

pyspark with pypy not work for spark-1.5.1

2015-11-04 Thread Chang Ya-Hsuan

Hi all, I am trying to run pyspark with pypy, and it is work when using spark-1.3.1 but failed when using spark-1.4.1 and spark-1.5.1 my pypy version: $ /usr/bin/pypy --version Python 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015, 15:18:40) [PyPy 2.2.1 with GCC 4.8.4] works with spark-1.3.1 $

Re: PMML version in MLLib

2015-11-04 Thread Fazlan Nazeem

Thanks Owen. Will do it On Wed, Nov 4, 2015 at 5:22 PM, Sean Owen wrote: > I'm pretty sure that attribute is required. I am not sure what PMML > version the code has been written for but would assume 4.2.1. Feel > free to open a PR to add this version to all the output. > >

Re: Codegen In Shuffle

2015-11-04 Thread 牛兆捷

I see. Thanks very much. 2015-11-04 16:25 GMT+08:00 Reynold Xin : > GenerateUnsafeProjection -- projects any internal row data structure > directly into bytes (UnsafeRow). > > > On Wed, Nov 4, 2015 at 12:21 AM, 牛兆捷 wrote: > >> Dear all: >> >> Tungsten

Re: PMML version in MLLib

2015-11-04 Thread Sean Owen

I'm pretty sure that attribute is required. I am not sure what PMML version the code has been written for but would assume 4.2.1. Feel free to open a PR to add this version to all the output. On Wed, Nov 4, 2015 at 11:42 AM, Fazlan Nazeem wrote: > [adding dev] > > On Wed, Nov

Re: Master build fails ?

2015-11-04 Thread Jacek Laskowski

Hi, It appears it's time to switch to my lovely sbt then! Pozdrawiam, Jacek -- Jacek Laskowski | http://blog.japila.pl | http://blog.jaceklaskowski.pl Follow me at https://twitter.com/jaceklaskowski Upvote at http://stackoverflow.com/users/1305344/jacek-laskowski On Tue, Nov 3, 2015 at 2:58

Re: Please reply if you use Mesos fine grained mode

2015-11-04 Thread Heller, Chris

We’ve been making use of both. Fine-grain mode makes sense for more ad-hoc work loads, and coarse-grained for more job like loads on a common data set. My preference is the fine-grain mode in all cases, but the overhead associated with its startup and the possibility that an overloaded cluster

Re: [VOTE] Release Apache Spark 1.5.2 (RC2)

2015-11-04 Thread Jean-Baptiste Onofré

+1 (non binding) Just tested with some snippets on my side. Regards JB On 11/04/2015 12:22 AM, Reynold Xin wrote: Please vote on releasing the following candidate as Apache Spark version 1.5.2. The vote is open until Sat Nov 7, 2015 at 00:00 UTC and passes if a majority of at least 3 +1 PMC

Re: Codegen In Shuffle

2015-11-04 Thread Reynold Xin

GenerateUnsafeProjection -- projects any internal row data structure directly into bytes (UnsafeRow). On Wed, Nov 4, 2015 at 12:21 AM, 牛兆捷 wrote: > Dear all: > > Tungsten project has mentioned that they are applying code generation is > to speed up the conversion of data

Codegen In Shuffle

2015-11-04 Thread 牛兆捷

Dear all: Tungsten project has mentioned that they are applying code generation is to speed up the conversion of data from in-memory binary format to wire-protocol for shuffle. Where can I find the related implementation in spark code-based ? -- *Regards,* *Zhaojie*

RE: Sort Merge Join from the filesystem

Re: How to force statistics calculation of Dataframe?

Re: [VOTE] Release Apache Spark 1.5.2 (RC2)

Why LibSVMRelation and CsvRelation don't extends HadoopFsRelation ?

RE: dataframe slow down with tungsten turn on

RE: Why LibSVMRelation and CsvRelation don't extends HadoopFsRelation ?

Re: Why LibSVMRelation and CsvRelation don't extends HadoopFsRelation ?

RE: Why LibSVMRelation and CsvRelation don't extends HadoopFsRelation ?

pyspark with pypy not work for spark-1.5.1

Re: PMML version in MLLib

Re: Codegen In Shuffle

Re: PMML version in MLLib

Re: Master build fails ?

Re: Please reply if you use Mesos fine grained mode

Re: [VOTE] Release Apache Spark 1.5.2 (RC2)

Re: Codegen In Shuffle

Codegen In Shuffle

17 matches

Site Navigation

Mail list logo

Footer information