from:"guoxu1231"

Join operation on DStreams

2015-09-21 Thread guoxu1231

Hi Spark Experts, 

I'm trying to use join(otherStream, [numTasks]) on DStreams,  and it
requires called on two DStreams of (K, V) and (K, W) pairs,

Usually in common RDD, we could use keyBy(f) to build the (K, V) pair,
however I could not find it in DStream. 

My question is:
What is the expected way to build (K, V) pair in DStream?


Thanks
Shawn




--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Join-operation-on-DStreams-tp14228.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Help, pyspark.sql.List flatMap results become tuple

2014-12-30 Thread guoxu1231

Thanks Davies, it works in 1.2. 



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Help-pyspark-sql-List-flatMap-results-become-tuple-tp9961p9975.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Help, pyspark.sql.List flatMap results become tuple

2014-12-29 Thread guoxu1231

Hi pyspark guys, 

I have a json file, and its struct like below:

{NAME:George, AGE:35, ADD_ID:1212, POSTAL_AREA:1,
TIME_ZONE_ID:1, INTEREST:[{INTEREST_NO:1, INFO:x},
{INTEREST_NO:2, INFO:y}]}
{NAME:John, AGE:45, ADD_ID:1213, POSTAL_AREA:1, TIME_ZONE_ID:1,
INTEREST:[{INTEREST_NO:2, INFO:x}, {INTEREST_NO:3, INFO:y}]}

I'm using spark sql api to manipulate the json data in pyspark shell, 

*sqlContext = SQLContext(sc)
A400= sqlContext.jsonFile('jason_file_path')*
/Row(ADD_ID=1212, AGE=35, INTEREST=[Row(INFO=u'x', INTEREST_NO=1),
Row(INFO=u'y', INTEREST_NO=2)], NAME=u'George', POSTAL_AREA=1,
TIME_ZONE_ID=1)
Row(ADD_ID=1213, AGE=45, INTEREST=[Row(INFO=u'x', INTEREST_NO=2),
Row(INFO=u'y', INTEREST_NO=3)], NAME=u'John', POSTAL_AREA=1,
TIME_ZONE_ID=1)/
*X = A400.flatMap(lambda i: i.INTEREST)*
The flatMap results like below, each element in json array were flatten to
tuple, not my expected  pyspark.sql.Row. I can only access the flatten
results by index. but it supposed to be flatten to Row(namedTuple) and
support to access by name.
(u'x', 1)
(u'y', 2)
(u'x', 2)
(u'y', 3)

My spark version is 1.1.







--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Help-pyspark-sql-List-flatMap-results-become-tuple-tp9961.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Help, pyspark.sql.List flatMap results become tuple

2014-12-29 Thread guoxu1231

named tuple degenerate to tuple. 
*A400.map(lambda i: map(None,i.INTEREST))*
===
[(u'x', 1), (u'y', 2)]
[(u'x', 2), (u'y', 3)]



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Help-pyspark-sql-List-flatMap-results-become-tuple-tp9961p9962.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Join operation on DStreams

Re: Help, pyspark.sql.List flatMap results become tuple

Help, pyspark.sql.List flatMap results become tuple

Re: Help, pyspark.sql.List flatMap results become tuple

4 matches

Site Navigation

Mail list logo

Footer information