Join operation on DStreams
Hi Spark Experts, I'm trying to use join(otherStream, [numTasks]) on DStreams, and it requires called on two DStreams of (K, V) and (K, W) pairs, Usually in common RDD, we could use keyBy(f) to build the (K, V) pair, however I could not find it in DStream. My question is: What is the expected way to build (K, V) pair in DStream? Thanks Shawn -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Join-operation-on-DStreams-tp14228.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Help, pyspark.sql.List flatMap results become tuple
Thanks Davies, it works in 1.2. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Help-pyspark-sql-List-flatMap-results-become-tuple-tp9961p9975.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Help, pyspark.sql.List flatMap results become tuple
Hi pyspark guys, I have a json file, and its struct like below: {NAME:George, AGE:35, ADD_ID:1212, POSTAL_AREA:1, TIME_ZONE_ID:1, INTEREST:[{INTEREST_NO:1, INFO:x}, {INTEREST_NO:2, INFO:y}]} {NAME:John, AGE:45, ADD_ID:1213, POSTAL_AREA:1, TIME_ZONE_ID:1, INTEREST:[{INTEREST_NO:2, INFO:x}, {INTEREST_NO:3, INFO:y}]} I'm using spark sql api to manipulate the json data in pyspark shell, *sqlContext = SQLContext(sc) A400= sqlContext.jsonFile('jason_file_path')* /Row(ADD_ID=1212, AGE=35, INTEREST=[Row(INFO=u'x', INTEREST_NO=1), Row(INFO=u'y', INTEREST_NO=2)], NAME=u'George', POSTAL_AREA=1, TIME_ZONE_ID=1) Row(ADD_ID=1213, AGE=45, INTEREST=[Row(INFO=u'x', INTEREST_NO=2), Row(INFO=u'y', INTEREST_NO=3)], NAME=u'John', POSTAL_AREA=1, TIME_ZONE_ID=1)/ *X = A400.flatMap(lambda i: i.INTEREST)* The flatMap results like below, each element in json array were flatten to tuple, not my expected pyspark.sql.Row. I can only access the flatten results by index. but it supposed to be flatten to Row(namedTuple) and support to access by name. (u'x', 1) (u'y', 2) (u'x', 2) (u'y', 3) My spark version is 1.1. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Help-pyspark-sql-List-flatMap-results-become-tuple-tp9961.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Help, pyspark.sql.List flatMap results become tuple
named tuple degenerate to tuple. *A400.map(lambda i: map(None,i.INTEREST))* === [(u'x', 1), (u'y', 2)] [(u'x', 2), (u'y', 3)] -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Help-pyspark-sql-List-flatMap-results-become-tuple-tp9961p9962.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org