Re:RE: how to set database in DataFrame.saveAsTable?

2016-02-21 Thread Mich Talebzadeh
 

Well my version of Spark is 1.5.2 

On 21/02/2016 23:54, Jacky Wang wrote: 

> df.write.saveAsTable("db_name.tbl_name") // works, spark-shell, latest spark 
> version 1.6.0 
> df.write.saveAsTable("db_name.tbl_name") // NOT work, spark-shell, old spark 
> version 1.4 
> 
> --
> 
> Jacky Wang 
> 
> At 2016-02-21 17:35:53, "Mich Talebzadeh" <m...@peridale.co.uk> wrote:
> 
> I looked at doc on this. It is not clear what goes behind the scene. Very 
> little documentation on it 
> 
> First in Hive a database has to exist before it can be used so sql("use 
> mytable") will not create a database for you. 
> 
> Also you cannot call your table mytable in database mytable! 
> 
> Remember in Hive the hierarchy is the database followed by tables. 
> 
> If you want to create a Hive database (I would not for every table) for this 
> purpose you can do 
> 
> scala> sql("create database if not exists mytable_db") 
> 
> res10: org.apache.spark.sql.DataFrame = [result: string] 
> 
> scala> sql("use mytable_db") 
> 
> res12: org.apache.spark.sql.DataFrame = [result: string] 
> 
> This puts you in the context of mytable_db database 
> 
> If you do 
> 
> hdfs dfs -ls /user/hive/warehouse 
> 
> You will see a directory called mytable_db.db is created 
> 
> /user/hive/warehouse/mytable.db 
> 
> Then you can create a table in Hive in mytable_db if you wish. The way I do 
> it personally is to register your DF as a temporary table and do 
> insert/select into Hive table 
> 
> scala> sql("use mytable_db") 
> 
> res21: org.apache.spark.sql.DataFrame = [result: string] 
> 
> scala> """ 
> 
> | CREATE TABLE mytable ( 
> 
> | INVOICENUMBER INT 
> 
> | ,PAYMENTDATE timestamp 
> 
> | ,NET DECIMAL(20,2) 
> 
> | ,VAT DECIMAL(20,2) 
> 
> | ,TOTAL DECIMAL(20,2) 
> 
> | ) 
> 
> | COMMENT 'a test table' 
> 
> | STORED AS ORC 
> 
> | TBLPROPERTIES ( "orc.compress"="ZLIB" ) 
> 
> | """ 
> 
> res22: String = 
> 
> " 
> 
> CREATE TABLE mytable ( 
> 
> INVOICENUMBER INT 
> 
> ,PAYMENTDATE timestamp 
> 
> ,NET DECIMAL(20,2) 
> 
> ,VAT DECIMAL(20,2) 
> 
> ,TOTAL DECIMAL(20,2) 
> 
> ) 
> 
> COMMENT 'a test table' 
> 
> STORED AS ORC 
> 
> TBLPROPERTIES ( "orc.compress"="ZLIB" ) 
> 
> " 
> 
> scala> sql(sqltext) 
> 
> res6: org.apache.spark.sql.DataFrame = [result: string] 
> 
> My DF is called "a" below so I register it as a temp table called tmp 
> 
> a.toDF.registerTempTable("tmp") 
> 
> Then just insert/select into Hicve table mytable from tmp 
> 
> scala> sqltext = "INSERT INTO mytable SELECT * FROM tmp" 
> 
> sqltext: String = INSERT INTO mytable SELECT * FROM tmp 
> 
> scala> sql(sqltext) 
> 
> res10: org.apache.spark.sql.DataFrame = [] 
> 
> scala> sql("select count(1) from mytable").show 
> 
> +---+ 
> 
> |_c0| 
> 
> +---+ 
> 
> | 65| 
> 
> HTH 
> 
> Dr Mich Talebzadeh 
> 
> LinkedIn _ 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  [1]_ 
> 
> http://talebzadehmich.wordpress.com [2] 
> 
> NOTE: The information in this email is proprietary and confidential. This 
> message is for the designated recipient only, if you are not the intended 
> recipient, you should destroy it immediately. Any information in this message 
> shall not be understood as given or endorsed by Peridale Technology Ltd, its 
> subsidiaries or their employees, unless expressly so stated. It is the 
> responsibility of the recipient to ensure that this email is virus free, 
> therefore neither Peridale Technology Ltd, its subsidiaries nor their 
> employees accept any responsibility. 
> 
> FROM: Glen [mailto:cng...@gmail.com] 
> SENT: 21 February 2016 03:26
> TO: gen tang <gen.tan...@gmail.com>
> CC: user@spark.apache.org
> SUBJECT: Re: how to set database in DataFrame.saveAsTable? 
> 
> Any example code? 
> 
> In pyspark: 
> 
> sqlContex.sql("use mytable") 
> 
> my_df.saveAsTable("tmp_spark_debug", mode="overwrite") 
> 
> 1. The code above seems not register the table in hive. I have to create 
> table from hdfs in hive, it reports some format error: rcformat and parquet. 
> 
> 2. Rerun the saveAsTable using mode="overwrite" in saveAsTable, it reports 
> the table already exists. 
> 
> 3. Sometimes it creates a directory in hive/warehouse/

Re:RE: how to set database in DataFrame.saveAsTable?

2016-02-21 Thread Jacky Wang
df.write.saveAsTable("db_name.tbl_name")  // works, spark-shell, latest spark 
version 1.6.0
df.write.saveAsTable("db_name.tbl_name")  // NOT work, spark-shell, old spark 
version 1.4






--

Jacky Wang

At 2016-02-21 17:35:53, "Mich Talebzadeh" <m...@peridale.co.uk> wrote:


I looked at doc on this. It is not clear what goes behind the scene. Very 
little documentation on it

 

First in Hive a database has to exist before it can be used so sql(“use 
mytable”) will not create a database for you.

 

Also you cannot call your table mytable in database mytable!

 

Remember in Hive the hierarchy is the database followed by tables.

 

If you want to create a Hive database (I would not for every table) for this 
purpose you can do

 

scala> sql("create database if not exists mytable_db")

res10: org.apache.spark.sql.DataFrame = [result: string]

 

scala> sql("use mytable_db")

res12: org.apache.spark.sql.DataFrame = [result: string]

 

This puts you in the context of mytable_db database

 

If you do

 

hdfs dfs -ls /user/hive/warehouse

 

You will see a directory called mytable_db.db is created

 

/user/hive/warehouse/mytable.db

 

Then you can create a table in Hive in mytable_db if you wish.  The way I do it 
personally is to register your DF as a temporary table and do insert/select 
into Hive table

 

scala> sql("use mytable_db")

res21: org.apache.spark.sql.DataFrame = [result: string]

 

scala> """

 | CREATE TABLE mytable (

 | INVOICENUMBER  INT

 | ,PAYMENTDATEtimestamp

 | ,NETDECIMAL(20,2)

 | ,VATDECIMAL(20,2)

 | ,TOTAL  DECIMAL(20,2)

 | )

 |  COMMENT 'a test table'

 | STORED AS ORC

 | TBLPROPERTIES ( "orc.compress"="ZLIB" )

 | """

res22: String =

"

CREATE TABLE mytable (

INVOICENUMBER  INT

,PAYMENTDATEtimestamp

,NETDECIMAL(20,2)

,VATDECIMAL(20,2)

,TOTAL  DECIMAL(20,2)

)

COMMENT 'a test table'

STORED AS ORC

TBLPROPERTIES ( "orc.compress"="ZLIB" )

"

scala> sql(sqltext)

res6: org.apache.spark.sql.DataFrame = [result: string]

 

 

My DF is called “a” below so I register it as a temp table called tmp

 

a.toDF.registerTempTable("tmp")

 

Then just insert/select into Hicve table mytable from tmp

 

scala> sqltext = "INSERT INTO mytable SELECT * FROM tmp"

sqltext: String = INSERT INTO mytable SELECT * FROM tmp

 

scala> sql(sqltext)

res10: org.apache.spark.sql.DataFrame = []

 

scala> sql("select count(1) from mytable").show

+---+

|_c0|

+---+

| 65|

|

 

|

 

HTH

 

Dr Mich Talebzadeh

 

LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com

 

NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Technology Ltd, its 
subsidiaries or their employees, unless expressly so stated. It is the 
responsibility of the recipient to ensure that this email is virus free, 
therefore neither Peridale Technology Ltd, its subsidiaries nor their employees 
accept any responsibility.

 

 

From: Glen [mailto:cng...@gmail.com]
Sent: 21 February 2016 03:26
To: gen tang <gen.tan...@gmail.com>
Cc:user@spark.apache.org
Subject: Re: how to set database in DataFrame.saveAsTable?

 

Any example code?

 

In pyspark:

sqlContex.sql("use mytable")

my_df.saveAsTable("tmp_spark_debug", mode="overwrite")

 

1. The code above seems not register the table in hive. I have to create table 
from hdfs in hive, it reports some format error: rcformat and parquet.

2. Rerun the saveAsTable using  mode="overwrite" in saveAsTable, it reports the 
table already exists.

3. Sometimes it creates a directory in  hive/warehouse/tmp_spark_debug, not in 
hive/warehouse/mytable/tmp_spark_debug.

 

 

My goal is simple:

df.saveAsTable('blablabla')  // create a hive table in some database, then it 
can be visited by hive.

 

I tried lots of time, it seems there are lots of bug in pyspark. Or my mehtod 
is wrong?

 

2016-02-21 10:04 GMT+08:00 gen tang <gen.tan...@gmail.com>:

Hi,

 

You can use 

sqlContext.sql("use ")

before use dataframe.saveAsTable

 

Hope it could be helpful

 

Cheers

Gen

 

 

On Sun, Feb 21, 2016 at 9:55 AM, Glen <cng...@gmail.com> wrote:

For dataframe in spark, so the table can be visited by hive.


 

--

Jacky Wang

 





 

--

Jacky Wang

RE: how to set database in DataFrame.saveAsTable?

2016-02-21 Thread Mich Talebzadeh
I looked at doc on this. It is not clear what goes behind the scene. Very 
little documentation on it

 

First in Hive a database has to exist before it can be used so sql(“use 
mytable”) will not create a database for you. 

 

Also you cannot call your table mytable in database mytable!

 

Remember in Hive the hierarchy is the database followed by tables.

 

If you want to create a Hive database (I would not for every table) for this 
purpose you can do

 

scala> sql("create database if not exists mytable_db")

res10: org.apache.spark.sql.DataFrame = [result: string]

 

scala> sql("use mytable_db")

res12: org.apache.spark.sql.DataFrame = [result: string]

 

This puts you in the context of mytable_db database

 

If you do

 

hdfs dfs -ls /user/hive/warehouse

 

You will see a directory called mytable_db.db is created

 

/user/hive/warehouse/mytable.db

 

Then you can create a table in Hive in mytable_db if you wish.  The way I do it 
personally is to register your DF as a temporary table and do insert/select 
into Hive table

 

scala> sql("use mytable_db")

res21: org.apache.spark.sql.DataFrame = [result: string]

 

scala> """

 | CREATE TABLE mytable (

 | INVOICENUMBER  INT

 | ,PAYMENTDATEtimestamp

 | ,NETDECIMAL(20,2)

 | ,VATDECIMAL(20,2)

 | ,TOTAL  DECIMAL(20,2)

 | )

 |  COMMENT 'a test table'

 | STORED AS ORC

 | TBLPROPERTIES ( "orc.compress"="ZLIB" )

 | """

res22: String =

"

CREATE TABLE mytable (

INVOICENUMBER  INT

,PAYMENTDATEtimestamp

,NETDECIMAL(20,2)

,VATDECIMAL(20,2)

,TOTAL  DECIMAL(20,2)

)

COMMENT 'a test table'

STORED AS ORC

TBLPROPERTIES ( "orc.compress"="ZLIB" )

"

scala> sql(sqltext)

res6: org.apache.spark.sql.DataFrame = [result: string]

 

 

My DF is called “a” below so I register it as a temp table called tmp

 

a.toDF.registerTempTable("tmp")

 

Then just insert/select into Hicve table mytable from tmp

 

scala> sqltext = "INSERT INTO mytable SELECT * FROM tmp"

sqltext: String = INSERT INTO mytable SELECT * FROM tmp

 

scala> sql(sqltext)

res10: org.apache.spark.sql.DataFrame = []

 

scala> sql("select count(1) from mytable").show

+---+

|_c0|

+---+

| 65|


 

 

HTH

 

Dr Mich Talebzadeh

 

LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> 

 

NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Technology Ltd, its 
subsidiaries or their employees, unless expressly so stated. It is the 
responsibility of the recipient to ensure that this email is virus free, 
therefore neither Peridale Technology Ltd, its subsidiaries nor their employees 
accept any responsibility.

 

 

From: Glen [mailto:cng...@gmail.com] 
Sent: 21 February 2016 03:26
To: gen tang <gen.tan...@gmail.com>
Cc: user@spark.apache.org
Subject: Re: how to set database in DataFrame.saveAsTable?

 

Any example code?

 

In pyspark:

sqlContex.sql("use mytable")

my_df.saveAsTable("tmp_spark_debug", mode="overwrite")

 

1. The code above seems not register the table in hive. I have to create table 
from hdfs in hive, it reports some format error: rcformat and parquet.

2. Rerun the saveAsTable using  mode="overwrite" in saveAsTable, it reports the 
table already exists.

3. Sometimes it creates a directory in  hive/warehouse/tmp_spark_debug, not in 
hive/warehouse/mytable/tmp_spark_debug.

 

 

My goal is simple:

df.saveAsTable('blablabla')  // create a hive table in some database, then it 
can be visited by hive.

 

I tried lots of time, it seems there are lots of bug in pyspark. Or my mehtod 
is wrong?

 

2016-02-21 10:04 GMT+08:00 gen tang <gen.tan...@gmail.com 
<mailto:gen.tan...@gmail.com> >:

Hi,

 

You can use 

sqlContext.sql("use ")

before use dataframe.saveAsTable

 

Hope it could be helpful

 

Cheers

Gen

 

 

On Sun, Feb 21, 2016 at 9:55 AM, Glen <cng...@gmail.com 
<mailto:cng...@gmail.com> > wrote:

For dataframe in spark, so the table can be visited by hive.


 

-- 

Jacky Wang

 





 

-- 

Jacky Wang



Re: how to set database in DataFrame.saveAsTable?

2016-02-20 Thread Glen
Any example code?

In pyspark:
sqlContex.sql("use mytable")
my_df.saveAsTable("tmp_spark_debug", mode="overwrite")

1. The code above seems not register the table in hive. I have to create
table from hdfs in hive, it reports some format error: rcformat and parquet.
2. Rerun the saveAsTable using  mode="overwrite" in saveAsTable, it reports
the table already exists.
3. Sometimes it creates a directory in  hive/warehouse/tmp_spark_debug, not
in hive/warehouse/mytable/tmp_spark_debug.


My goal is simple:
df.saveAsTable('blablabla')  // create a hive table in some database, then
it can be visited by hive.

I tried lots of time, it seems there are lots of bug in pyspark. Or my
mehtod is wrong?

2016-02-21 10:04 GMT+08:00 gen tang :

> Hi,
>
> You can use
> sqlContext.sql("use ")
> before use dataframe.saveAsTable
>
> Hope it could be helpful
>
> Cheers
> Gen
>
>
> On Sun, Feb 21, 2016 at 9:55 AM, Glen  wrote:
>
>> For dataframe in spark, so the table can be visited by hive.
>>
>> --
>> Jacky Wang
>>
>
>


-- 
Jacky Wang


Re: how to set database in DataFrame.saveAsTable?

2016-02-20 Thread gen tang
Hi,

You can use
sqlContext.sql("use ")
before use dataframe.saveAsTable

Hope it could be helpful

Cheers
Gen


On Sun, Feb 21, 2016 at 9:55 AM, Glen  wrote:

> For dataframe in spark, so the table can be visited by hive.
>
> --
> Jacky Wang
>


how to set database in DataFrame.saveAsTable?

2016-02-20 Thread Glen
For dataframe in spark, so the table can be visited by hive.

-- 
Jacky Wang