Hi, All
I want to compute the rank of some column in a table.
Currently, I use the window function to do it.
However all data will be in one partition.
Is there better solution to do it?
Regards,
Kevin.
Hi, All
I'm joining a small table (about 200m) with a huge table using broadcast join,
however, spark throw the exception as follows:
16/03/20 22:32:06 WARN TransportChannelHandler: Exception in connection from
java.lang.OutOfMemoryError: Direct buffer memory
at
-avro, and
csvhttps://github.com/databricks/spark-csv.
Thanks,
Yin
On Mon, Mar 23, 2015 at 7:14 PM, Dai, Kevin
yun...@ebay.commailto:yun...@ebay.com wrote:
Hi, Paul
You are right.
The story is that we have a lot of pig load function to load our different data.
And now we want to use spark
Hi, all
Can spark use pig's load function to load data?
Best Regards,
Kevin.
.
From: Paul Brown [mailto:p...@mult.ifario.us]
Sent: 2015年3月24日 4:11
To: Dai, Kevin
Subject: Re: Use pig load function in spark
The answer is Maybe, but you probably don't want to do that..
A typical Pig load function is devoted to bridging external data into Pig's
type system, but you don't
No, I don’t have tow master instances.
From: Akhil Das [mailto:ak...@sigmoidanalytics.com]
Sent: 2015年3月9日 15:03
To: Dai, Kevin
Cc: user@spark.apache.org
Subject: Re: A strange problem in spark sql join
Make sure you don't have two master instances running on the same machine. It
could happen
Hi, guys
I encounter a strange problem as follows:
I joined two tables(which are both parquet files) and then did the groupby. The
groupby took 19 hours to finish.
However, when I kill this job twice in the groupby stage. The third try will su
But after I killed this job and run it again. It
.
Best Regards,
Kevin
From: Rishi Yadav [mailto:ri...@infoobjects.com]
Sent: 2015年1月9日 6:52
To: Dai, Kevin
Cc: user@spark.apache.org
Subject: Re: Implement customized Join for SparkSQL
Hi Kevin,
Say A has 10 ids, so you are pulling data from B's data source only for these
10 ids?
What if you
Hi, All
Suppose I want to join two tables A and B as follows:
Select * from A join B on A.id = B.id
A is a file while B is a database which indexed by id and I wrapped it by Data
source API.
The desired join flow is:
1. Generate A's RDD[Row]
2. Generate B's RDD[Row] from A by
Hi, ALL
How can I group by one column and order by another one, then select the first
row for each group (which is just like window function doing) by SparkSQL?
Best Regards,
Kevin.
Hi, all
Suppose I have a RDD of (K, V) tuple and I do groupBy on it with the key K.
My question is how to make each groupBy resukt whick is (K, iterable[V]) a RDD.
BTW, can we transform it as a DStream and also each groupBY result is a RDD in
it?
Best Regards,
Kevin.
HI, all
Is there setup and cleanup function as in hadoop mapreduce in spark which does
some initialization and cleanup work?
Best Regards,
Kevin.
Hi, all
My job failed and there are a lot of ERROR ConnectionManager: Corresponding
SendingConnection to ConnectionManagerId not found information in the log.
Can anyone tell me what's wrong and how to fix it?
Best Regards,
Kevin.
Hi, ALL
I have a RDD[T], can I use it like a iterator.
That means I can compute every element of this RDD lazily.
Best Regards,
Kevin.
Hi, ALL
I have a RDD of case class T and T contains several primitive types and a Map.
How can I convert this to a SchemaRDD?
Best Regards,
Kevin.
Hi, All
Is there any way to convert iterable to RDD?
Thanks,
Kevin.
In addition, how to convert Iterable[Iterable[T]] to RDD[T]
Thanks,
Kevin.
From: Dai, Kevin [mailto:yun...@ebay.com]
Sent: 2014年10月21日 10:58
To: user@spark.apache.org
Subject: Convert Iterable to RDD
Hi, All
Is there any way to convert iterable to RDD?
Thanks,
Kevin.
Hi, All
We need an interactive interface tool for spark in which we can run spark job
and plot graph to explorer the data interactively.
Ipython notebook is good, but it only support python (we want one supporting
scala)...
BR,
Kevin.
18 matches
Mail list logo