RE: Rename Several Aggregated Columns

2016-03-22 Thread Andres.Fernandez
Thank you! Yes that's the way to go taking care of selecting them in the proper 
order first. Added a select before the toDF and it does the trick.

From: Sunitha Kambhampati [mailto:skambha...@gmail.com]
Sent: Friday, March 18, 2016 5:46 PM
To: Fernandez, Andres
Cc: user@spark.apache.org
Subject: Re: Rename Several Aggregated Columns


One way is to rename the columns using the toDF

For eg:


val df = Seq((1, 2),(1,4),(2,3) ).toDF("a","b")
df.printSchema()

val renamedf = df.groupBy('a).agg(sum('b)).toDF("mycola", "mycolb")
renamedf.printSchema()
Best regards,
Sunitha

On Mar 18, 2016, at 9:10 AM, 
andres.fernan...@wellsfargo.com wrote:

Good morning. I have a dataframe and would like to group by on two fields, and 
perform a sum aggregation on more than 500 fields, though I would like to keep 
the same name for the 500 hundred fields (instead of sum(Field)). I do have the 
field names in an array. Could anybody help with this question please?



Rename Several Aggregated Columns

2016-03-19 Thread Andres.Fernandez
Good morning. I have a dataframe and would like to group by on two fields, and 
perform a sum aggregation on more than 500 fields, though I would like to keep 
the same name for the 500 hundred fields (instead of sum(Field)). I do have the 
field names in an array. Could anybody help with this question please?



RE: Union Parquet, DataFrame

2016-03-01 Thread Andres.Fernandez
Worked perfectly. Thanks very much Silvio.

From: Silvio Fiorito [mailto:silvio.fior...@granturing.com]
Sent: Tuesday, March 01, 2016 2:14 PM
To: Fernandez, Andres; user@spark.apache.org
Subject: Re: Union Parquet, DataFrame

Just replied to your other email, but here’s the same thing:

Just do:

val df = sqlContext.read.load(“/path/to/parquets/*”)

If you do df.explain it’ll show the multiple input paths.

From: "andres.fernan...@wellsfargo.com" 
>
Date: Tuesday, March 1, 2016 at 12:01 PM
To: "user@spark.apache.org" 
>
Subject: Union Parquet, DataFrame

Good day colleagues. Quick question on Parquet and Dataframes. Right now I have 
the 4 parquet files stored in HDFS under the same path:
/path/to/parquets/parquet1, /path/to/parquets/parquet2, 
/path/to/parquets/parquet3, /path/to/parquets/parquet4…
I want to perform a union on all this parquet files. Is there any other way of 
doing this different to DataFrame’s unionAll?

Thank you very much in advance.

Andres Fernandez



Union Parquet, DataFrame

2016-03-01 Thread Andres.Fernandez
Good day colleagues. Quick question on Parquet and Dataframes. Right now I have 
the 4 parquet files stored in HDFS under the same path:
/path/to/parquets/parquet1, /path/to/parquets/parquet2, 
/path/to/parquets/parquet3, /path/to/parquets/parquet4…
I want to perform a union on all this parquet files. Is there any other way of 
doing this different to DataFrame’s unionAll?

Thank you very much in advance.

Andres Fernandez



RE: Save DataFrame to Hive Table

2016-03-01 Thread Andres.Fernandez
Good day colleagues. Quick question on Parquet and Dataframes. Right now I have 
the 4 parquet files stored in HDFS under the same path:
/path/to/parquets/parquet1, /path/to/parquets/parquet2, 
/path/to/parquets/parquet3, /path/to/parquets/parquet4…
I want to perform a union on all this parquet files. Is there any other way of 
doing this different to DataFrame’s unionAll?

Thank you very much in advance.

Andres Fernandez

From: Mich Talebzadeh [mailto:mich.talebza...@gmail.com]
Sent: Tuesday, March 01, 2016 1:50 PM
To: Jeff Zhang
Cc: Yogesh Vyas; user@spark.apache.org
Subject: Re: Save DataFrame to Hive Table

Hi

It seems that your code is not specifying which database is your table created

Try this

scala> val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
scala> // Choose a database
scala> HiveContext.sql("show databases").show

scala> HiveContext.sql("use test")  // I chose test database
scala> HiveContext.sql("CREATE TABLE IF NOT EXISTS TableName (key INT, value 
STRING)")
scala> HiveContext.sql("desc TableName").show
++-+---+
|col_name|data_type|comment|
++-+---+
| key|  int|   null|
|   value|   string|   null|
++-+---+

// create a simple DF

Seq((1, "Mich"), (2, "James"))
val b = a.toDF

//Let me keep it simple. Create a temporary table and do a simple 
insert/select. No need to convolute it

b.registerTempTable("tmp")

// Rember this temporaryTable is created in sql context NOT HiveContext/ So 
HiveContext will NOT see that table
//
HiveContext.sql("INSERT INTO TableName SELECT * FROM tmp")
org.apache.spark.sql.AnalysisException: no such table tmp; line 1 pos 36

// This will work

sql("INSERT INTO TableName SELECT * FROM tmp")

sql("select count(1) from TableName").show
+---+
|_c0|
+---+
|  2|
+---+

HTH



Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com



On 1 March 2016 at 06:33, Jeff Zhang 
> wrote:

The following line does not execute the sql so the table is not created.  Add 
.show() at the end to execute the sql.

hiveContext.sql("CREATE TABLE IF NOT EXISTS TableName (key INT, value STRING)")

On Tue, Mar 1, 2016 at 2:22 PM, Yogesh Vyas 
> wrote:
Hi,

I have created a DataFrame in Spark, now I want to save it directly
into the hive table. How to do it.?

I have created the hive table using following hiveContext:

HiveContext hiveContext = new 
org.apache.spark.sql.hive.HiveContext(sc.sc());
hiveContext.sql("CREATE TABLE IF NOT EXISTS TableName (key
INT, value STRING)");

I am using the following to save it into hive:
DataFrame.write().mode(SaveMode.Append).insertInto("TableName");

But it gives the error:
Exception in thread "main" java.lang.RuntimeException: Table Not
Found: TableName
at scala.sys.package$.error(package.scala:27)
at 
org.apache.spark.sql.catalyst.analysis.SimpleCatalog.lookupRelation(Catalog.scala:139)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.getTable(Analyzer.scala:257)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$7.applyOrElse(Analyzer.scala:266)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$7.applyOrElse(Analyzer.scala:264)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:57)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:57)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:56)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:264)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:254)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:83)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:80)
at 
scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111)
at scala.collection.immutable.List.foldLeft(List.scala:84)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:80)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:72)
at scala.collection.immutable.List.foreach(List.scala:318)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:72)
at 

RE: how to correctly run scala script using spark-shell through stdin (spark v1.0.0)

2016-01-27 Thread Andres.Fernandez
So far, still cannot find a way of running a small Scala script right after 
executing the shell, and get the shell to remain open. Is there a way of doing 
this?
Feels like a simple/naive question but really couldn’t find an answer.

From: Fernandez, Andres
Sent: Tuesday, January 26, 2016 2:53 PM
To: 'Ewan Leith'; Iulian Dragoș
Cc: user
Subject: RE: how to correctly run scala script using spark-shell through stdin 
(spark v1.0.0)

True thank you. Is there a way of having the shell not closed (how to avoid the 
:quit statement). Thank you both.

Andres

From: Ewan Leith [mailto:ewan.le...@realitymine.com]
Sent: Tuesday, January 26, 2016 1:50 PM
To: Iulian Dragoș; Fernandez, Andres
Cc: user
Subject: RE: how to correctly run scala script using spark-shell through stdin 
(spark v1.0.0)

I’ve just tried running this using a normal stdin redirect:

~/spark/bin/spark-shell < simple.scala

Which worked, it started spark-shell, executed the script, the stopped the 
shell.

Thanks,
Ewan

From: Iulian Dragoș [mailto:iulian.dra...@typesafe.com]
Sent: 26 January 2016 15:00
To: fernandrez1987 
>
Cc: user >
Subject: Re: how to correctly run scala script using spark-shell through stdin 
(spark v1.0.0)


I don’t see -i in the output of spark-shell --help. Moreover, in master I get 
an error:

$ bin/spark-shell -i test.scala

bad option: '-i'

iulian
​

On Tue, Jan 26, 2016 at 3:47 PM, fernandrez1987 
> wrote:
spark-shell -i file.scala is not working for me in Spark 1.6.0, was this
removed or what do I have to take into account? The script does not get run
at all. What can be happening?








--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/how-to-correctly-run-scala-script-using-spark-shell-through-stdin-spark-v1-0-0-tp12972p26071.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.org
For additional commands, e-mail: 
user-h...@spark.apache.org



--

--
Iulian Dragos

--
Reactive Apps on the JVM
www.typesafe.com



RE: how to correctly run scala script using spark-shell through stdin (spark v1.0.0)

2016-01-26 Thread Andres.Fernandez
True thank you. Is there a way of having the shell not closed (how to avoid the 
:quit statement). Thank you both.

Andres

From: Ewan Leith [mailto:ewan.le...@realitymine.com]
Sent: Tuesday, January 26, 2016 1:50 PM
To: Iulian Dragoș; Fernandez, Andres
Cc: user
Subject: RE: how to correctly run scala script using spark-shell through stdin 
(spark v1.0.0)

I’ve just tried running this using a normal stdin redirect:

~/spark/bin/spark-shell < simple.scala

Which worked, it started spark-shell, executed the script, the stopped the 
shell.

Thanks,
Ewan

From: Iulian Dragoș [mailto:iulian.dra...@typesafe.com]
Sent: 26 January 2016 15:00
To: fernandrez1987 
>
Cc: user >
Subject: Re: how to correctly run scala script using spark-shell through stdin 
(spark v1.0.0)


I don’t see -i in the output of spark-shell --help. Moreover, in master I get 
an error:

$ bin/spark-shell -i test.scala

bad option: '-i'

iulian
​

On Tue, Jan 26, 2016 at 3:47 PM, fernandrez1987 
> wrote:
spark-shell -i file.scala is not working for me in Spark 1.6.0, was this
removed or what do I have to take into account? The script does not get run
at all. What can be happening?








--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/how-to-correctly-run-scala-script-using-spark-shell-through-stdin-spark-v1-0-0-tp12972p26071.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.org
For additional commands, e-mail: 
user-h...@spark.apache.org



--

--
Iulian Dragos

--
Reactive Apps on the JVM
www.typesafe.com