Re: Load multiple CSV from different paths

2017-07-05 Thread Didac Gil
Thanks man!

That was the key.

source = […].toSeq

sources: _*

Learnt something more with Scala.

> On 5 Jul 2017, at 16:29, Radhwane Chebaane  wrote:
> 
> Hi,
> 
> Referring to spark 2.x documentation, in org.apache.spark.sql.DataFrameReader 
>  you have this function:
> def csv(paths: String*): DataFrame 
> 
> 
> So you can unpack your Array of paths like this:
> val sources = paths.split(',').toSeq
> spark.read.option("header", "false")
> .schema(custom_schema)
> .option('delimiter', '\t')
> .option('mode', 'DROPMALFORMED')
> .csv(sources: _*)
> 
> In spark 1.6.x I think this may work with spark-csv 
>  :
> 
> spark.read.format("com.databricks.spark.csv").option("header", "false")
> .schema(custom_schema)
> .option('delimiter', '\t')
> .option('mode', 'DROPMALFORMED')
> .load(sources: _*)
> 
> 
> Cheers,
> Radhwane Chebaane
> 
> 2017-07-05 16:08 GMT+02:00 Didac Gil  >:
> Hi,
> 
> Do you know any simple way to load multiple csv files (same schema) that are 
> in different paths?
> Wildcards are not a solution, as I want to load specific csv files from 
> different folders.
> 
> I came across a solution 
> (https://stackoverflow.com/questions/37639956/how-to-import-multiple-csv-files-in-a-single-load
>  
> )
>  that suggests something like
> 
> spark.read.format("csv").option("header", "false")
> .schema(custom_schema)
> .option('delimiter', '\t')
> .option('mode', 'DROPMALFORMED')
> .load(paths.split(','))
> However, even it mentions that this approach would work in Spark 2.x, I don’t 
> find an implementation of load that accepts an Array[String] as an input 
> parameter.
> 
> Thanks in advance for your help.
> 
> 
> Didac Gil de la Iglesia
> PhD in Computer Science
> didacg...@gmail.com 
> Spain: +34 696 285 544 
> Sweden: +46 (0)730229737 
> Skype: didac.gil.de.la.iglesia
> 
> 
> 
> 
> --
> 
>   Radhwane Chebaane
> Distributed systems engineer, Mindlytix
> Mail: radhw...@mindlytix.com  
> Mobile: +33 695 588 906   
> 
> Skype: rad.cheb  
> LinkedIn   
> 

Didac Gil de la Iglesia
PhD in Computer Science
didacg...@gmail.com
Spain: +34 696 285 544
Sweden: +46 (0)730229737
Skype: didac.gil.de.la.iglesia



signature.asc
Description: Message signed with OpenPGP


Re: Load multiple CSV from different paths

2017-07-05 Thread Radhwane Chebaane
Hi,

Referring to spark 2.x documentation, in
org.apache.spark.sql.DataFrameReader  you have this function:
def csv(paths: String*): DataFrame


So you can unpack your Array of paths like this:

val sources = paths.split(',').toSeq

spark.read.option("header", "false")
.schema(custom_schema)
.option('delimiter', '\t')
.option('mode', 'DROPMALFORMED')
.csv(sources: _*)


In spark 1.6.x I think this may work with spark-csv
 :

spark.read.format("com.databricks.spark.csv").option("header", "false")
.schema(custom_schema)
.option('delimiter', '\t')
.option('mode', 'DROPMALFORMED')
.load(sources: _*)



Cheers,
Radhwane Chebaane

2017-07-05 16:08 GMT+02:00 Didac Gil :

> Hi,
>
> Do you know any simple way to load multiple csv files (same schema) that
> are in different paths?
> Wildcards are not a solution, as I want to load specific csv files from
> different folders.
>
> I came across a solution (https://stackoverflow.com/
> questions/37639956/how-to-import-multiple-csv-files-in-a-single-load) that
> suggests something like
>
> spark.read.format("csv").option("header", "false")
> .schema(custom_schema)
> .option('delimiter', '\t')
> .option('mode', 'DROPMALFORMED')
> .load(paths.split(','))
>
> However, even it mentions that this approach would work in Spark 2.x, I
> don’t find an implementation of load that accepts an Array[String] as an
> input parameter.
>
> Thanks in advance for your help.
>
>
> Didac Gil de la Iglesia
> PhD in Computer Science
> didacg...@gmail.com
> Spain: +34 696 285 544 <+34%20696%2028%2055%2044>
> Sweden: +46 (0)730229737 <+46%2073%20022%2097%2037>
> Skype: didac.gil.de.la.iglesia
>
>


-- 

[image: photo] Radhwane Chebaane
Distributed systems engineer, Mindlytix

Mail: radhw...@mindlytix.com  
Mobile: +33 695 588 906 <+33+695+588+906>

Skype: rad.cheb  
LinkedIn 



Load multiple CSV from different paths

2017-07-05 Thread Didac Gil
Hi,

Do you know any simple way to load multiple csv files (same schema) that are in 
different paths?
Wildcards are not a solution, as I want to load specific csv files from 
different folders.

I came across a solution 
(https://stackoverflow.com/questions/37639956/how-to-import-multiple-csv-files-in-a-single-load
 
)
 that suggests something like

spark.read.format("csv").option("header", "false")
.schema(custom_schema)
.option('delimiter', '\t')
.option('mode', 'DROPMALFORMED')
.load(paths.split(','))
However, even it mentions that this approach would work in Spark 2.x, I don’t 
find an implementation of load that accepts an Array[String] as an input 
parameter.

Thanks in advance for your help.


Didac Gil de la Iglesia
PhD in Computer Science
didacg...@gmail.com
Spain: +34 696 285 544
Sweden: +46 (0)730229737
Skype: didac.gil.de.la.iglesia



signature.asc
Description: Message signed with OpenPGP