Re: Best practises to storing data in Parquet files

2016-08-29 Thread Mich Talebzadeh
all. > > How should we store in HDFS (directory structure, ... )? > Should partition the file into small pieces. > > > > On Aug 28, 2016, at 9:43 PM, Kevin Tran <kevin...@gmail.com> wrote: > > > > Hi, > > Does anyone know what is the best practis

Re: Best practises to storing data in Parquet files

2016-08-28 Thread Chanh Le
delete a specific time without rebuild them all. > How should we store in HDFS (directory structure, ... )? Should partition the file into small pieces. > On Aug 28, 2016, at 9:43 PM, Kevin Tran <kevin...@gmail.com> wrote: > > Hi, > Does anyone know what is the best practises to

Re: Best practises to storing data in Parquet files

2016-08-28 Thread Kevin Tran
reference architecture which HBase is apart of ? Please share with me best practises you might know or your favourite designs. Thanks, Kevin. On Mon, Aug 29, 2016 at 5:18 AM, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > Hi, > > Can you explain about you p

Re: Best practises to storing data in Parquet files

2016-08-28 Thread Mich Talebzadeh
wrote: > Hi, > Does anyone know what is the best practises to store data to parquet file? > Does parquet file has limit in size ( 1TB ) ? > Should we use SaveMode.APPEND for long running streaming app ? > How should we store in HDFS (directory structure, ... )? > > Thanks, > Kevin. >

Best practises to storing data in Parquet files

2016-08-28 Thread Kevin Tran
Hi, Does anyone know what is the best practises to store data to parquet file? Does parquet file has limit in size ( 1TB ) ? Should we use SaveMode.APPEND for long running streaming app ? How should we store in HDFS (directory structure, ... )? Thanks, Kevin.

Re: Best practises around spark-scala

2016-08-08 Thread Deepak Sharma
..@gmail.com> wrote: > >> Hi All, >> Can anyone please give any documents that may be there around spark-scala >> best practises? >> >> -- >> Thanks >> Deepak >> www.bigdatabig.com >> www.keosha.net >> >

Re: Best practises around spark-scala

2016-08-08 Thread vaquar khan
anyone please give any documents that may be there around spark-scala > best practises? > > -- > Thanks > Deepak > www.bigdatabig.com > www.keosha.net >

Best practises around spark-scala

2016-08-08 Thread Deepak Sharma
Hi All, Can anyone please give any documents that may be there around spark-scala best practises? -- Thanks Deepak www.bigdatabig.com www.keosha.net

Re: Best practises of share Spark cluster over few applications

2016-02-14 Thread Alex Kozlov
Praveen, the mode in which you run spark (standalone, yarn, mesos) is determined when you create SparkContext . You are right that spark-submit and spark-shell create different SparkContexts. In general,

Re: Best practises of share Spark cluster over few applications

2016-02-14 Thread praveen S
Even i was trying to launch spark jobs from webservice : But I thought you could run spark jobs in yarn mode only through spark-submit. Is my understanding not correct? Regards, Praveen On 15 Feb 2016 08:29, "Sabarish Sasidharan" wrote: > Yes you can look at

Re: Best practises of share Spark cluster over few applications

2016-02-14 Thread Sabarish Sasidharan
Yes you can look at using the capacity scheduler or the fair scheduler with YARN. Both allow using full cluster when idle. And both allow considering cpu plus memory when allocating resources which is sort of necessary with Spark. Regards Sab On 13-Feb-2016 10:11 pm, "Eugene Morozov"

Best practises of share Spark cluster over few applications

2016-02-13 Thread Eugene Morozov
Hi, I have several instances of the same web-service that is running some ML algos on Spark (both training and prediction) and do some Spark unrelated job. Each web-service instance creates their on JavaSparkContext, thus they're seen as separate applications by Spark, thus they're configured

Re: Best practises of share Spark cluster over few applications

2016-02-13 Thread Jörn Franke
This is possible with yarn. You also need to think about preemption in case one web service starts doing something and after a while another web service wants also to do something. > On 13 Feb 2016, at 17:40, Eugene Morozov wrote: > > Hi, > > I have several

Re: Best practises

2015-11-02 Thread Denny Lee
-- >>> *发件人:* "Deepak Sharma"<deepakmc...@gmail.com>; >>> *发送时间:* 2015年10月30日(星期五) 晚上7:23 >>> *收件人:* "user"<user@spark.apache.org>; >>> *主题:* Best practises >>> >>> Hi >>> I am looking for any blog / doc

Re: Best practises

2015-11-02 Thread satish chandra j
m>; > *发送时间:* 2015年10月30日(星期五) 晚上7:23 > *收件人:* "user"<user@spark.apache.org>; > *主题:* Best practises > > Hi > I am looking for any blog / doc on the developer's best practices if using > Spark .I have already looked at the tuning guide on spark.apache.org. &g

Re: Best practises

2015-11-02 Thread Sushrut Ikhar
help us. >>>> >>>> >>>> -- 原始邮件 -- >>>> *发件人:* "Deepak Sharma"<deepakmc...@gmail.com>; >>>> *发送时间:* 2015年10月30日(星期五) 晚上7:23 >>>> *收件人:* "user"<user@spark.apac

Re: Best practises

2015-11-02 Thread Stefano Baghino
k Sharma"<deepakmc...@gmail.com>; >> *发送时间:* 2015年10月30日(星期五) 晚上7:23 >> *收件人:* "user"<user@spark.apache.org>; >> *主题:* Best practises >> >> Hi >> I am looking for any blog / doc on the developer's best practices if >> using S

Best practises

2015-10-30 Thread Deepak Sharma
Hi I am looking for any blog / doc on the developer's best practices if using Spark .I have already looked at the tuning guide on spark.apache.org. Please do let me know if any one is aware of any such resource. Thanks Deepak

??????Best practises

2015-10-30 Thread huangzheng
I have the same question.anyone help us. -- -- ??: "Deepak Sharma"<deepakmc...@gmail.com>; : 2015??10??30??(??) 7:23 ??: "user"<user@spark.apache.org>; : Best practises Hi

Best practises to clean up RDDs for old applications

2015-10-08 Thread Jens Rantil
/rdd/spark-local-20150903112858-a72d 23M /var/lib/spark/rdd/spark-local-20150929141201-143f The applications (such as "20150903112858") aren't running anymore. What are best practises to clean these up? A cron job? Enabling some kind of cleaner in Spark? I'm currently running