Join operation on DStreams

2015-09-21 Thread guoxu1231
Hi Spark Experts, I'm trying to use join(otherStream, [numTasks]) on DStreams, and it requires called on two DStreams of (K, V) and (K, W) pairs, Usually in common RDD, we could use keyBy(f) to build the (K, V) pair, however I could not find it in DStream. My question is: What is the

Re: Help, pyspark.sql.List flatMap results become tuple

2014-12-30 Thread guoxu1231
Thanks Davies, it works in 1.2. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Help-pyspark-sql-List-flatMap-results-become-tuple-tp9961p9975.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Help, pyspark.sql.List flatMap results become tuple

2014-12-29 Thread guoxu1231
Hi pyspark guys, I have a json file, and its struct like below: {NAME:George, AGE:35, ADD_ID:1212, POSTAL_AREA:1, TIME_ZONE_ID:1, INTEREST:[{INTEREST_NO:1, INFO:x}, {INTEREST_NO:2, INFO:y}]} {NAME:John, AGE:45, ADD_ID:1213, POSTAL_AREA:1, TIME_ZONE_ID:1, INTEREST:[{INTEREST_NO:2, INFO:x},

Re: Help, pyspark.sql.List flatMap results become tuple

2014-12-29 Thread guoxu1231
named tuple degenerate to tuple. *A400.map(lambda i: map(None,i.INTEREST))* === [(u'x', 1), (u'y', 2)] [(u'x', 2), (u'y', 3)] -- View this message in context: