Hi,
I have a rdd with the following structure:
row1: key: Seq[a, b]; value: value 1
row2: key: seq[a, c, f]; value: value 2
Is there an efficient way to de-flat the rows into?
row1: key: a; value: value1
row2: key: a; value: value2
row3: key: b; value: value1
row4: key: c; value: value2
row5:
.conf=/etc/krb5.conf /home/spark/myApps/TestHBase.jar
-- 原始邮件 --
*发件人:* Bill Q;bill.q@gmail.com
javascript:_e(%7B%7D,'cvml','bill.q@gmail.com');;
*发送时间:* 2015年5月20日(星期三) 晚上10:13
*收件人:* donhoff_h165612...@qq.com
javascript:_e(%7B%7D,'cvml','165612...@qq.com
I have similar problem that I cannot pass the HBase configuration file as
extra classpath to Spark any more using
spark.executor.extraClassPath=MY_HBASE_CONF_DIR in the Spark 1.3. We used
to run this in 1.2 without any problem.
On Tuesday, May 19, 2015, donhoff_h 165612...@qq.com wrote:
Sorry,
Hi,
We have some Spark job that ran well under Spark 1.2 using spark-submit
--conf spark.executor.extraClassPath=/etc/hbase/conf and the Java HBase
driver code the Spark called can pick up the settings for HBase such as
ZooKeeper addresses.
But after upgrade to CDH 5.4.1 Spark 1.3, the Spark code
RDD1 = RDD.filter()
RDD2 = RDD.filter()
*From:* Bill Q [mailto:bill.q@gmail.com
javascript:_e(%7B%7D,'cvml','bill.q@gmail.com');]
*Sent:* Thursday, May 7, 2015 4:55 PM
*To:* Evo Eftimov
*Cc:* user@spark.apache.org
javascript:_e(%7B%7D,'cvml','user@spark.apache.org
:* Bill Q [mailto:bill.q@gmail.com
javascript:_e(%7B%7D,'cvml','bill.q@gmail.com');]
*Sent:* Tuesday, May 5, 2015 10:42 PM
*To:* user@spark.apache.org
javascript:_e(%7B%7D,'cvml','user@spark.apache.org');
*Subject:* Map one RDD into two RDD
Hi all,
I have a large RDD that I map
Hi all,
I have a large RDD that I map a function to it. Based on the nature of each
record in the input RDD, I will generate two types of data. I would like to
save each type into its own RDD. But I can't seem to find an efficient way
to do it. Any suggestions?
Many thanks.
Bill
--
Many
Hi,
I am trying to make Spark SQL 1.1 to work to replace part of our ETL
processes that are currently done by Hive 0.12.
A common problem that I have encountered is the Too many files open
error. Once that happened, the query just failed. I started the
spark-shell by using ulimit -n 4096
sort-based shuffle might help
in this regard.
On Fri, Oct 31, 2014 at 3:25 PM, Bill Q bill.q@gmail.com
javascript:; wrote:
Hi,
I am trying to make Spark SQL 1.1 to work to replace part of our ETL
processes that are currently done by Hive 0.12.
A common problem that I have