Converting Apache log string into map using delimiter

2014-11-11 Thread YaoPau
I have an RDD of logs that look like this:

/no_cache/bi_event?Log=0pg_inst=517638988975678942pg=fow_mwever=c.2.1.8site=xyz.compid=156431807121222351rid=156431666543211500srch_id=156431666581865115row=6seq=1tot=1tsp=1cmp=thmb_12co_txt_url=Viewinget=clickthmb_type=pct=uc=579855lnx=SPGOOGBRANDCAMPref_url=http%3A%2F%2Fwww.abcd.com

The pairs are separated by , and the keys/values of each pair are
separated by =.   Hive has a str_to_map function
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-StringFunctions
  
that will convert this String to a map that will make the following work:

mappedString[site] will return xyz.com

What's the most efficient way to do this in Scala + Spark?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Converting-Apache-log-string-into-map-using-delimiter-tp18641.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Converting Apache log string into map using delimiter

2014-11-11 Thread YaoPau
OK I got it working with:

z.map(row = (row.map(element = element.split(=)(0)) zip row.map(element
= element.split(=)(1))).toMap)

But I'm guessing there is a more efficient way than to create two separate
lists and then zip them together and then convert the result into a map.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Converting-Apache-log-string-into-map-using-delimiter-tp18641p18643.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Converting Apache log string into map using delimiter

2014-11-11 Thread Sean Owen
I think it would be faster/more compact as:

z.map(_.map { element =
val tokens = element.split(=)
(tokens(0), tokens(1))
  }.toMap)

(That's probably 95% right but I didn't compile or test it.)

On Wed, Nov 12, 2014 at 12:18 AM, YaoPau jonrgr...@gmail.com wrote:

 OK I got it working with:

 z.map(row = (row.map(element = element.split(=)(0)) zip row.map(element
 = element.split(=)(1))).toMap)

 But I'm guessing there is a more efficient way than to create two separate
 lists and then zip them together and then convert the result into a map.



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Converting-Apache-log-string-into-map-using-delimiter-tp18641p18643.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org