Re: Calculate sum of values in 2nd element of tuple

2016-01-03 Thread Roberto Congiu
For the first one, input.map { case(x,l) => (x, l.reduce(_ + _) ) } will do what you need. For the second, yes, there's a difference, one is a List the other is a Tuple. See for instance See for instance val a = (1,2,3) a.getClass.getName res4: String = scala.Tuple3 You should look up tuples

Re: Best practices to handle corrupted records

2015-10-15 Thread Roberto Congiu
I came to a similar solution to a similar problem. I deal with a lot of CSV files from many different sources and they are often malformed. HOwever, I just have success/failure. Maybe you should make SuccessWithWarnings a subclass of success, or getting rid of it altogether making the warnings

Re: Where is Redgate's HDFS explorer?

2015-08-29 Thread Roberto Congiu
If HDFS is on a linux VM, you could also mount it with FUSE and export it with samba 2015-08-29 2:26 GMT-07:00 Ted Yu yuzhih...@gmail.com: See https://hadoop.apache.org/docs/r2.7.0/hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.html FYI On Sat, Aug 29, 2015 at 1:04 AM, Akhil Das

Re: Where is Redgate's HDFS explorer?

2015-08-29 Thread Roberto Congiu
It depends, if HDFS is running under windows, FUSE won't work, but if HDFS is on a linux VM, Box, or cluster, then you can have the linux box/vm mount HDFS through FUSE and at the same time export its mount point on samba. At that point, your windows machine can just connect to the samba share. R.

Re: Local Spark talking to remote HDFS?

2015-08-25 Thread Roberto Congiu
August 2015 at 20:43, Roberto Congiu roberto.con...@gmail.com wrote: When you launch your HDP guest VM, most likely it gets launched with NAT and an address on a private network (192.168.x.x) so on your windows host you should use that address (you can find out using ifconfig on the guest OS

Re: Local Spark talking to remote HDFS?

2015-08-25 Thread Roberto Congiu
. I can't imagine I'm the only person on the planet wanting to do this. Anyway, thanks for trying to help. Dino. On 25 August 2015 at 08:22, Roberto Congiu roberto.con...@gmail.com wrote: Port 8020 is not the only port you need tunnelled for HDFS to work. If you only list

Re: Local Spark talking to remote HDFS?

2015-08-24 Thread Roberto Congiu
When you launch your HDP guest VM, most likely it gets launched with NAT and an address on a private network (192.168.x.x) so on your windows host you should use that address (you can find out using ifconfig on the guest OS). I usually add an entry to my /etc/hosts for VMs that I use oftenif

Re: SPARK sql :Need JSON back isntead of roq

2015-08-21 Thread Roberto Congiu
2015-08-21 3:17 GMT-07:00 smagadi sudhindramag...@fico.com: teenagers .toJSON gives the json but it does not preserve the parent ids meaning if the input was {name:Yin, address:{city:Columbus,state:Ohio},age:20} val x= sqlContext.sql(SELECT name, address.city, address.state ,age FROM

Re: Nested DataFrame(SchemaRDD)

2015-06-23 Thread Roberto Congiu
I wrote a brief howto on building nested records in spark and storing them in parquet here: http://www.congiu.com/creating-nested-data-parquet-in-spark-sql/ 2015-06-23 16:12 GMT-07:00 Richard Catlin richard.m.cat...@gmail.com: How do I create a DataFrame(SchemaRDD) with a nested array of Rows