Re: Spark 2 cannot create ORC table when CLUSTERED. This worked in Spark 1.6.1

2016-08-13 Thread Mich Talebzadeh
Hi,
SPARK-17047  created

Thanks


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 13 August 2016 at 02:54, Jacek Laskowski  wrote:

> Hi Mich,
>
> File a JIRA issue as that seems as if they overlooked that part. Spark
> 2.0 has less and less HiveQL with more and more native support.
>
> (My take on this is that the days of Hive in Spark are counted and
> Hive is gonna disappear soon)
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Thu, Aug 11, 2016 at 10:02 AM, Mich Talebzadeh
>  wrote:
> >
> >
> > This does not work with CLUSTERED BY clause in Spark 2 now!
> >
> > CREATE TABLE test.dummy2
> >  (
> >  ID INT
> >, CLUSTERED INT
> >, SCATTERED INT
> >, RANDOMISED INT
> >, RANDOM_STRING VARCHAR(50)
> >, SMALL_VC VARCHAR(10)
> >, PADDING  VARCHAR(10)
> > )
> > CLUSTERED BY (ID) INTO 256 BUCKETS
> > STORED AS ORC
> > TBLPROPERTIES ( "orc.compress"="SNAPPY",
> > "orc.create.index"="true",
> > "orc.bloom.filter.columns"="ID",
> > "orc.bloom.filter.fpp"="0.05",
> > "orc.stripe.size"="268435456",
> > "orc.row.index.stride"="1" )
> > scala> HiveContext.sql(sqltext)
> > org.apache.spark.sql.catalyst.parser.ParseException:
> > Operation not allowed: CREATE TABLE ... CLUSTERED BY(line 2, pos 0)
> >
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn
> > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCd
> OABUrV8Pw
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > Disclaimer: Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> The
> > author will in no case be liable for any monetary damages arising from
> such
> > loss, damage or destruction.
> >
> >
>


Re: Spark 2 cannot create ORC table when CLUSTERED. This worked in Spark 1.6.1

2016-08-12 Thread Jacek Laskowski
Hi Mich,

File a JIRA issue as that seems as if they overlooked that part. Spark
2.0 has less and less HiveQL with more and more native support.

(My take on this is that the days of Hive in Spark are counted and
Hive is gonna disappear soon)

Pozdrawiam,
Jacek Laskowski

https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Thu, Aug 11, 2016 at 10:02 AM, Mich Talebzadeh
 wrote:
>
>
> This does not work with CLUSTERED BY clause in Spark 2 now!
>
> CREATE TABLE test.dummy2
>  (
>  ID INT
>, CLUSTERED INT
>, SCATTERED INT
>, RANDOMISED INT
>, RANDOM_STRING VARCHAR(50)
>, SMALL_VC VARCHAR(10)
>, PADDING  VARCHAR(10)
> )
> CLUSTERED BY (ID) INTO 256 BUCKETS
> STORED AS ORC
> TBLPROPERTIES ( "orc.compress"="SNAPPY",
> "orc.create.index"="true",
> "orc.bloom.filter.columns"="ID",
> "orc.bloom.filter.fpp"="0.05",
> "orc.stripe.size"="268435456",
> "orc.row.index.stride"="1" )
> scala> HiveContext.sql(sqltext)
> org.apache.spark.sql.catalyst.parser.ParseException:
> Operation not allowed: CREATE TABLE ... CLUSTERED BY(line 2, pos 0)
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> Disclaimer: Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed. The
> author will in no case be liable for any monetary damages arising from such
> loss, damage or destruction.
>
>

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Spark 2 cannot create ORC table when CLUSTERED. This worked in Spark 1.6.1

2016-08-11 Thread Gourav Sengupta
And SPARK even reads ORC data very slowly. And in case the HIVE table is
partitioned, then it just hangs.


Regards,
Gourav

On Thu, Aug 11, 2016 at 6:02 PM, Mich Talebzadeh 
wrote:

>
>
> This does not work with CLUSTERED BY clause in Spark 2 now!
>
> CREATE TABLE test.dummy2
>  (
>  ID INT
>, CLUSTERED INT
>, SCATTERED INT
>, RANDOMISED INT
>, RANDOM_STRING VARCHAR(50)
>, SMALL_VC VARCHAR(10)
>, PADDING  VARCHAR(10)
> )
> CLUSTERED BY (ID) INTO 256 BUCKETS
> STORED AS ORC
> TBLPROPERTIES ( "orc.compress"="SNAPPY",
> "orc.create.index"="true",
> "orc.bloom.filter.columns"="ID",
> "orc.bloom.filter.fpp"="0.05",
> "orc.stripe.size"="268435456",
> "orc.row.index.stride"="1" )
> scala> HiveContext.sql(sqltext)
> org.apache.spark.sql.catalyst.parser.ParseException:
> *Operation not allowed: CREATE TABLE ... CLUSTERED BY(line 2, pos 0)*
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>


Spark 2 cannot create ORC table when CLUSTERED. This worked in Spark 1.6.1

2016-08-11 Thread Mich Talebzadeh
This does not work with CLUSTERED BY clause in Spark 2 now!

CREATE TABLE test.dummy2
 (
 ID INT
   , CLUSTERED INT
   , SCATTERED INT
   , RANDOMISED INT
   , RANDOM_STRING VARCHAR(50)
   , SMALL_VC VARCHAR(10)
   , PADDING  VARCHAR(10)
)
CLUSTERED BY (ID) INTO 256 BUCKETS
STORED AS ORC
TBLPROPERTIES ( "orc.compress"="SNAPPY",
"orc.create.index"="true",
"orc.bloom.filter.columns"="ID",
"orc.bloom.filter.fpp"="0.05",
"orc.stripe.size"="268435456",
"orc.row.index.stride"="1" )
scala> HiveContext.sql(sqltext)
org.apache.spark.sql.catalyst.parser.ParseException:
*Operation not allowed: CREATE TABLE ... CLUSTERED BY(line 2, pos 0)*


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.