Re: Will spark cache table once even if I call read/cache on the same table multiple times

2016-11-20 Thread Taotao.Li
hi, you can check my stackoverflow question : http://stackoverflow.com/questions/36195105/what-happens-if-i-cache-the-same-rdd-twice-in-spark/36195812#36195812 On Sat, Nov 19, 2016 at 3:16 AM, Rabin Banerjee < dev.rabin.baner...@gmail.com> wrote: > Hi Yong, > > But every time val tabdf =

Re: Hi, guys, does anyone use Spark in finance market?

2016-09-01 Thread Taotao.Li
figure out *why* the price is likely to change, how much by and in which > direction. > > I agree that Apache Spark can be just the right tool for doing the heavy > lifting required for analysis, computation and modelling of big data so > looking forward to future Spark work in

Hi, guys, does anyone use Spark in finance market?

2016-08-30 Thread Taotao.Li
Hi, guys, I'm a quant engineer in China, and I believe it's very promising when using Spark in the financial market. But I didn't find cases which combine spark and finance. So here I wanna do a small survey: - do you guys use Spark in financial market related project? - if yes,

Re: [Community] Python support added to Spark Job Server

2016-08-20 Thread Taotao.Li
awesome, one question about the job server, when will it support Spark 2.x ? great thanks~ On Thu, Aug 18, 2016 at 1:04 AM, Evan Chan wrote: > Hi folks, > > Just a friendly message that we have added Python support to the REST > Spark Job Server project. If you are a

Does Spark SQL support indexes?

2016-08-13 Thread Taotao.Li
hi, guys, does Spark SQL support indexes? if so, how can I create an index on my temp table? if not, how can I handle some specific queries on a very large table? it would iterate all the table even though all I want is just a small piece of that table. great thanks, *___*

Re: spark and plot data

2016-07-22 Thread Taotao.Li
hi, pesudo, I've posted a blog before spark-dataframe-introduction , and for me, I use spark dataframe [ or RDD ] to do the logic calculation on all the datasets, and then transform the result into pandas dataframe, and make

Re: Understanding spark concepts cluster, master, slave, job, stage, worker, executor, task

2016-07-21 Thread Taotao.Li
-paper <http://litaotao.github.io/spark-resouces-blogs-paper?s=gmail> On Thu, Jul 21, 2016 at 12:19 PM, Sachin Mittal <sjmit...@gmail.com> wrote: > Hi, > Thanks for the links, is there any english translation for the same? > > Sachin > > > On Thu, Jul 21, 2016

Re: Understanding spark concepts cluster, master, slave, job, stage, worker, executor, task

2016-07-20 Thread Taotao.Li
Hi, Sachin, here are two posts about the basic concepts about spark: - spark-questions-concepts - deep-into-spark-exection-model And, I fully recommend

Re: RDD and Dataframes

2016-07-15 Thread Taotao.Li
hi, brccosta, databricks have just posted a blog about *RDD, Dataframe and Dataset*, you can check it here : https://databricks.com/blog/2016/07/14/a-tale-of-three-apache-spark-apis-rdds-dataframes-and-datasets.html , which will be very helpful for you I think. *___* Quant |

Re: Saving data frames on Spark Master/Driver

2016-07-14 Thread Taotao.Li
hi, consider transfer dataframe to rdd and then use* rdd.toLocalIterator *to collect data on the driver node. On Fri, Jul 15, 2016 at 9:05 AM, Pedro Rodriguez wrote: > Out of curiosity, is there a way to pull all the data back to the driver > to save without collect()?

Re: Spark 2.0 Release Date

2016-07-09 Thread Taotao.Li
docs I found and unreleased officially: - 2.0.0-preview: http://spark.apache.org/docs/2.0.0-preview/ - master-docs : http://people.apache.org/~pwendell/spark-nightly/spark-master-docs/latest/index.html - 2.0.0 docs:

pre-install 3-party Python package on spark cluster

2016-01-10 Thread taotao.li
I have a spark cluster, from machine-1 to machine 100, and machine-1 acts as the master. Then one day my program need use a 3-party python package which is not installed on every machine of the cluster. so here comes my problem: to make that 3-party python package usable on master and slaves,

Re: rdd.cache() not working ?

2015-04-01 Thread Taotao.Li
rerun person.count and you will see the performance of cache. person.cache would not cache it right now. It'll actually cache this RDD after one action[person.count here] - 原始邮件 - 发件人: fightf...@163.com 收件人: user user@spark.apache.org 发送时间: 星期三, 2015年 4 月 01日 下午 1:21:25 主题:

Re: Data/File structure Validation

2015-03-23 Thread Taotao.Li
can it load successfully if the format is invalid? - 原始邮件 - 发件人: Ahmed Nawar ahmed.na...@gmail.com 收件人: user@spark.apache.org 发送时间: 星期一, 2015年 3 月 23日 下午 4:48:54 主题: Data/File structure Validation Dears, Is there any way to validate the CSV, Json ... Files while loading to