from:"Dai, Kevin"

Compute the global rank of the column

2016-05-31 Thread Dai, Kevin

Hi, All I want to compute the rank of some column in a table. Currently, I use the window function to do it. However all data will be in one partition. Is there better solution to do it? Regards, Kevin.

java.lang.OutOfMemoryError: Direct buffer memory when using broadcast join

2016-03-21 Thread Dai, Kevin

Hi, All I'm joining a small table (about 200m) with a huge table using broadcast join, however, spark throw the exception as follows: 16/03/20 22:32:06 WARN TransportChannelHandler: Exception in connection from java.lang.OutOfMemoryError: Direct buffer memory at

RE: Use pig load function in spark

2015-03-23 Thread Dai, Kevin

-avro, and csvhttps://github.com/databricks/spark-csv. Thanks, Yin On Mon, Mar 23, 2015 at 7:14 PM, Dai, Kevin yun...@ebay.commailto:yun...@ebay.com wrote: Hi, Paul You are right. The story is that we have a lot of pig load function to load our different data. And now we want to use spark

Use pig load function in spark

2015-03-23 Thread Dai, Kevin

Hi, all Can spark use pig's load function to load data? Best Regards, Kevin.

RE: Use pig load function in spark

2015-03-23 Thread Dai, Kevin

. From: Paul Brown [mailto:p...@mult.ifario.us] Sent: 2015年3月24日 4:11 To: Dai, Kevin Subject: Re: Use pig load function in spark The answer is Maybe, but you probably don't want to do that.. A typical Pig load function is devoted to bridging external data into Pig's type system, but you don't

RE: A strange problem in spark sql join

2015-03-09 Thread Dai, Kevin

No, I don’t have tow master instances. From: Akhil Das [mailto:ak...@sigmoidanalytics.com] Sent: 2015年3月9日 15:03 To: Dai, Kevin Cc: user@spark.apache.org Subject: Re: A strange problem in spark sql join Make sure you don't have two master instances running on the same machine. It could happen

A strange problem in spark sql join

2015-03-09 Thread Dai, Kevin

Hi, guys I encounter a strange problem as follows: I joined two tables(which are both parquet files) and then did the groupby. The groupby took 19 hours to finish. However, when I kill this job twice in the groupby stage. The third try will su But after I killed this job and run it again. It

RE: Implement customized Join for SparkSQL

2015-01-09 Thread Dai, Kevin

. Best Regards, Kevin From: Rishi Yadav [mailto:ri...@infoobjects.com] Sent: 2015年1月9日 6:52 To: Dai, Kevin Cc: user@spark.apache.org Subject: Re: Implement customized Join for SparkSQL Hi Kevin, Say A has 10 ids, so you are pulling data from B's data source only for these 10 ids? What if you

Implement customized Join for SparkSQL

2015-01-05 Thread Dai, Kevin

Hi, All Suppose I want to join two tables A and B as follows: Select * from A join B on A.id = B.id A is a file while B is a database which indexed by id and I wrapped it by Data source API. The desired join flow is: 1. Generate A's RDD[Row] 2. Generate B's RDD[Row] from A by

Window function by Spark SQL

2014-12-04 Thread Dai, Kevin

Hi, ALL How can I group by one column and order by another one, then select the first row for each group (which is just like window function doing) by SparkSQL？ Best Regards, Kevin.

Transform RDD.groupBY result to multiple RDDs

2014-11-19 Thread Dai, Kevin

Hi, all Suppose I have a RDD of (K, V) tuple and I do groupBy on it with the key K. My question is how to make each groupBy resukt whick is (K, iterable[V]) a RDD. BTW, can we transform it as a DStream and also each groupBY result is a RDD in it? Best Regards, Kevin.

Is there setup and cleanup function in spark?

2014-11-13 Thread Dai, Kevin

HI, all Is there setup and cleanup function as in hadoop mapreduce in spark which does some initialization and cleanup work? Best Regards, Kevin.

ERROR ConnectionManager: Corresponding SendingConnection to ConnectionManagerId not found

2014-10-31 Thread Dai, Kevin

Hi, all My job failed and there are a lot of ERROR ConnectionManager: Corresponding SendingConnection to ConnectionManagerId not found information in the log. Can anyone tell me what's wrong and how to fix it? Best Regards, Kevin.

Use RDD like a Iterator

2014-10-28 Thread Dai, Kevin

Hi, ALL I have a RDD[T], can I use it like a iterator. That means I can compute every element of this RDD lazily. Best Regards, Kevin.

SchemaRDD Convert

2014-10-22 Thread Dai, Kevin

Hi, ALL I have a RDD of case class T and T contains several primitive types and a Map. How can I convert this to a SchemaRDD? Best Regards, Kevin.

Convert Iterable to RDD

2014-10-20 Thread Dai, Kevin

Hi, All Is there any way to convert iterable to RDD? Thanks, Kevin.

RE: Convert Iterable to RDD

2014-10-20 Thread Dai, Kevin

In addition, how to convert Iterable[Iterable[T]] to RDD[T] Thanks, Kevin. From: Dai, Kevin [mailto:yun...@ebay.com] Sent: 2014年10月21日 10:58 To: user@spark.apache.org Subject: Convert Iterable to RDD Hi, All Is there any way to convert iterable to RDD? Thanks, Kevin.

Interactive interface tool for spark

2014-10-08 Thread Dai, Kevin

Hi, All We need an interactive interface tool for spark in which we can run spark job and plot graph to explorer the data interactively. Ipython notebook is good, but it only support python (we want one supporting scala)... BR, Kevin.

Compute the global rank of the column

java.lang.OutOfMemoryError: Direct buffer memory when using broadcast join

RE: Use pig load function in spark

Use pig load function in spark

RE: Use pig load function in spark

RE: A strange problem in spark sql join

A strange problem in spark sql join

RE: Implement customized Join for SparkSQL

Implement customized Join for SparkSQL

Window function by Spark SQL

Transform RDD.groupBY result to multiple RDDs

Is there setup and cleanup function in spark?

ERROR ConnectionManager: Corresponding SendingConnection to ConnectionManagerId not found

Use RDD like a Iterator

SchemaRDD Convert

Convert Iterable to RDD

RE: Convert Iterable to RDD

Interactive interface tool for spark

18 matches

Site Navigation

Mail list logo

Footer information