Create multiple rows from elements in array on a single row

2015-06-08 Thread Bill Q
Hi, I have a rdd with the following structure: row1: key: Seq[a, b]; value: value 1 row2: key: seq[a, c, f]; value: value 2 Is there an efficient way to de-flat the rows into? row1: key: a; value: value1 row2: key: a; value: value2 row3: key: b; value: value1 row4: key: c; value: value2 row5:

Re: How to use spark to access HBase with Security enabled

2015-05-21 Thread Bill Q
.conf=/etc/krb5.conf /home/spark/myApps/TestHBase.jar -- 原始邮件 -- *发件人:* Bill Q;bill.q@gmail.com javascript:_e(%7B%7D,'cvml','bill.q@gmail.com');; *发送时间:* 2015年5月20日(星期三) 晚上10:13 *收件人:* donhoff_h165612...@qq.com javascript:_e(%7B%7D,'cvml','165612...@qq.com

Re: How to use spark to access HBase with Security enabled

2015-05-20 Thread Bill Q
I have similar problem that I cannot pass the HBase configuration file as extra classpath to Spark any more using spark.executor.extraClassPath=MY_HBASE_CONF_DIR in the Spark 1.3. We used to run this in 1.2 without any problem. On Tuesday, May 19, 2015, donhoff_h 165612...@qq.com wrote: Sorry,

Spark 1.3 classPath problem

2015-05-19 Thread Bill Q
Hi, We have some Spark job that ran well under Spark 1.2 using spark-submit --conf spark.executor.extraClassPath=/etc/hbase/conf and the Java HBase driver code the Spark called can pick up the settings for HBase such as ZooKeeper addresses. But after upgrade to CDH 5.4.1 Spark 1.3, the Spark code

Re: Map one RDD into two RDD

2015-05-07 Thread Bill Q
RDD1 = RDD.filter() RDD2 = RDD.filter() *From:* Bill Q [mailto:bill.q@gmail.com javascript:_e(%7B%7D,'cvml','bill.q@gmail.com');] *Sent:* Thursday, May 7, 2015 4:55 PM *To:* Evo Eftimov *Cc:* user@spark.apache.org javascript:_e(%7B%7D,'cvml','user@spark.apache.org

Re: Map one RDD into two RDD

2015-05-07 Thread Bill Q
:* Bill Q [mailto:bill.q@gmail.com javascript:_e(%7B%7D,'cvml','bill.q@gmail.com');] *Sent:* Tuesday, May 5, 2015 10:42 PM *To:* user@spark.apache.org javascript:_e(%7B%7D,'cvml','user@spark.apache.org'); *Subject:* Map one RDD into two RDD Hi all, I have a large RDD that I map

Map one RDD into two RDD

2015-05-05 Thread Bill Q
Hi all, I have a large RDD that I map a function to it. Based on the nature of each record in the input RDD, I will generate two types of data. I would like to save each type into its own RDD. But I can't seem to find an efficient way to do it. Any suggestions? Many thanks. Bill -- Many

Too many files open with Spark 1.1 and CDH 5.1

2014-10-31 Thread Bill Q
Hi, I am trying to make Spark SQL 1.1 to work to replace part of our ETL processes that are currently done by Hive 0.12. A common problem that I have encountered is the Too many files open error. Once that happened, the query just failed. I started the spark-shell by using ulimit -n 4096

Re: Too many files open with Spark 1.1 and CDH 5.1

2014-10-31 Thread Bill Q
sort-based shuffle might help in this regard. On Fri, Oct 31, 2014 at 3:25 PM, Bill Q bill.q@gmail.com javascript:; wrote: Hi, I am trying to make Spark SQL 1.1 to work to replace part of our ETL processes that are currently done by Hive 0.12. A common problem that I have