subject:"Re\: SAS_TO_SPARK_SQL_\(Could be a Bug\?\)"

Re: SAS_TO_SPARK_SQL_(Could be a Bug?)

2016-06-13 Thread Deepak Sharma

Hi Ajay
Looking at spark code , i can see you used hive context.
Can you try using  sql context instead of hive context there?

Thanks
Deepak

On Mon, Jun 13, 2016 at 10:15 PM, Ajay Chander  wrote:

> Hi Mohit,
>
> Thanks for your time. Please find my response below.
>
> Did you try the same with another database?
> I do load the data from MySQL and SQL Server the same way(through SPARK
> SQL JDBC) which works perfectly alright.
>
> As a workaround you can write the select statement yourself instead of
> just providing the table name?
> Yes I did that too. It did not made any difference.
>
> Thank you,
> Ajay
>
> On Sunday, June 12, 2016, Mohit Jaggi  wrote:
>
>> Looks like a bug in the code generating the SQL query…why would it be
>> specific to SAS, I can’t guess. Did you try the same with another database?
>> As a workaround you can write the select statement yourself instead of just
>> providing the table name.
>>
>> On Jun 11, 2016, at 6:27 PM, Ajay Chander  wrote:
>>
>> I tried implementing the same functionality through Scala as well. But no
>> luck so far. Just wondering if anyone here tried using Spark SQL to read
>> SAS dataset? Thank you
>>
>> Regards,
>> Ajay
>>
>> On Friday, June 10, 2016, Ajay Chander  wrote:
>>
>>> Mich, I completely agree with you. I built another Spark SQL application
>>> which reads data from MySQL and SQL server and writes the data into
>>> Hive(parquet+snappy format). I have this problem only when I read directly
>>> from remote SAS system. The interesting part is I am using same driver to
>>> read data through pure Java app and spark app. It works fine in Java
>>> app, so I cannot blame SAS driver here. Trying to understand where the
>>> problem could be. Thanks for sharing this with me.
>>>
>>> On Friday, June 10, 2016, Mich Talebzadeh 
>>> wrote:
>>>
 I personally use Scala to do something similar. For example here I
 extract data from an Oracle table and store in ORC table in Hive. This is
 compiled via sbt as run with SparkSubmit.

 It is similar to your code but in Scala. Note that I do not enclose my
 column names in double quotes.

 import org.apache.spark.SparkContext
 import org.apache.spark.SparkConf
 import org.apache.spark.sql.Row
 import org.apache.spark.sql.hive.HiveContext
 import org.apache.spark.sql.types._
 import org.apache.spark.sql.SQLContext
 import org.apache.spark.sql.functions._

 object ETL_scratchpad_dummy {
   def main(args: Array[String]) {
   val conf = new SparkConf().
setAppName("ETL_scratchpad_dummy").
set("spark.driver.allowMultipleContexts", "true")
   val sc = new SparkContext(conf)
   // Create sqlContext based on HiveContext
   val sqlContext = new HiveContext(sc)
   import sqlContext.implicits._
   val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
   println ("\nStarted at"); sqlContext.sql("SELECT
 FROM_unixtime(unix_timestamp(), 'dd/MM/ HH:mm:ss.ss')
 ").collect.foreach(println)
   HiveContext.sql("use oraclehadoop")
   var _ORACLEserver : String = "jdbc:oracle:thin:@rhes564:1521:mydb12"
   var _username : String = "scratchpad"
   var _password : String = ""

   // Get data from Oracle table scratchpad.dummy
   val d = HiveContext.load("jdbc",
   Map("url" -> _ORACLEserver,
   "dbtable" -> "(SELECT to_char(ID) AS ID, to_char(CLUSTERED) AS
 CLUSTERED, to_char(SCATTERED) AS SCATTERED, to_char(RANDOMISED) AS
 RANDOMISED, RANDOM_STRING, SMALL_VC, PADDING FROM scratchpad.dummy)",
   "user" -> _username,
   "password" -> _password))

d.registerTempTable("tmp")
   //
   // Need to create and populate target ORC table oraclehadoop.dummy
   //
   HiveContext.sql("use oraclehadoop")
   //
   // Drop and create table dummy
   //
   HiveContext.sql("DROP TABLE IF EXISTS oraclehadoop.dummy")
   var sqltext : String = ""
   sqltext = """
   CREATE TABLE oraclehadoop.dummy (
  ID INT
, CLUSTERED INT
, SCATTERED INT
, RANDOMISED INT
, RANDOM_STRING VARCHAR(50)
, SMALL_VC VARCHAR(10)
, PADDING  VARCHAR(10)
   )
   CLUSTERED BY (ID) INTO 256 BUCKETS
   STORED AS ORC
   TBLPROPERTIES (
   "orc.create.index"="true",
   "orc.bloom.filter.columns"="ID",
   "orc.bloom.filter.fpp"="0.05",
   "orc.compress"="SNAPPY",
   "orc.stripe.size"="16777216",
   "orc.row.index.stride"="1" )
   """
HiveContext.sql(sqltext)
   //
   // Put data in Hive table. Clean up is already done
   //
   sqltext = """
   INSERT INTO TABLE oraclehadoop.dummy
   SELECT
   ID
 , CLUSTERED
 , SCATTERED
 , RANDOMISED

Re: SAS_TO_SPARK_SQL_(Could be a Bug?)

2016-06-12 Thread Mohit Jaggi

Looks like a bug in the code generating the SQL query…why would it be specific 
to SAS, I can’t guess. Did you try the same with another database? As a 
workaround you can write the select statement yourself instead of just 
providing the table name.

> On Jun 11, 2016, at 6:27 PM, Ajay Chander  wrote:
> 
> I tried implementing the same functionality through Scala as well. But no 
> luck so far. Just wondering if anyone here tried using Spark SQL to read SAS 
> dataset? Thank you
> 
> Regards,
> Ajay
> 
> On Friday, June 10, 2016, Ajay Chander  > wrote:
> Mich, I completely agree with you. I built another Spark SQL application 
> which reads data from MySQL and SQL server and writes the data into 
> Hive(parquet+snappy format). I have this problem only when I read directly 
> from remote SAS system. The interesting part is I am using same driver to 
> read data through pure Java app and spark app. It works fine in Java app, so 
> I cannot blame SAS driver here. Trying to understand where the problem could 
> be. Thanks for sharing this with me.
> 
> On Friday, June 10, 2016, Mich Talebzadeh  > wrote:
> I personally use Scala to do something similar. For example here I extract 
> data from an Oracle table and store in ORC table in Hive. This is compiled 
> via sbt as run with SparkSubmit.
> 
> It is similar to your code but in Scala. Note that I do not enclose my column 
> names in double quotes.  
> 
> import org.apache.spark.SparkContext
> import org.apache.spark.SparkConf
> import org.apache.spark.sql.Row
> import org.apache.spark.sql.hive.HiveContext
> import org.apache.spark.sql.types._
> import org.apache.spark.sql.SQLContext
> import org.apache.spark.sql.functions._
> 
> object ETL_scratchpad_dummy {
>   def main(args: Array[String]) {
>   val conf = new SparkConf().
>setAppName("ETL_scratchpad_dummy").
>set("spark.driver.allowMultipleContexts", "true")
>   val sc = new SparkContext(conf)
>   // Create sqlContext based on HiveContext
>   val sqlContext = new HiveContext(sc)
>   import sqlContext.implicits._
>   val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
>   println ("\nStarted at"); sqlContext.sql("SELECT 
> FROM_unixtime(unix_timestamp(), 'dd/MM/ HH:mm:ss.ss') 
> ").collect.foreach(println)
>   HiveContext.sql("use oraclehadoop")
>   var _ORACLEserver : String = "jdbc:oracle:thin:@rhes564:1521:mydb12"
>   var _username : String = "scratchpad"
>   var _password : String = ""
> 
>   // Get data from Oracle table scratchpad.dummy
>   val d = HiveContext.load("jdbc",
>   Map("url" -> _ORACLEserver,
>   "dbtable" -> "(SELECT to_char(ID) AS ID, to_char(CLUSTERED) AS CLUSTERED, 
> to_char(SCATTERED) AS SCATTERED, to_char(RANDOMISED) AS RANDOMISED, 
> RANDOM_STRING, SMALL_VC, PADDING FROM scratchpad.dummy)",
>   "user" -> _username,
>   "password" -> _password))
>  
>d.registerTempTable("tmp")
>   //
>   // Need to create and populate target ORC table oraclehadoop.dummy
>   //
>   HiveContext.sql("use oraclehadoop")
>   //
>   // Drop and create table dummy
>   //
>   HiveContext.sql("DROP TABLE IF EXISTS oraclehadoop.dummy")
>   var sqltext : String = ""
>   sqltext = """
>   CREATE TABLE oraclehadoop.dummy (
>  ID INT
>, CLUSTERED INT
>, SCATTERED INT
>, RANDOMISED INT
>, RANDOM_STRING VARCHAR(50)
>, SMALL_VC VARCHAR(10)
>, PADDING  VARCHAR(10)
>   )
>   CLUSTERED BY (ID) INTO 256 BUCKETS
>   STORED AS ORC
>   TBLPROPERTIES (
>   "orc.create.index"="true",
>   "orc.bloom.filter.columns"="ID",
>   "orc.bloom.filter.fpp"="0.05",
>   "orc.compress"="SNAPPY",
>   "orc.stripe.size"="16777216",
>   "orc.row.index.stride"="1" )
>   """
>HiveContext.sql(sqltext)
>   //
>   // Put data in Hive table. Clean up is already done
>   //
>   sqltext = """
>   INSERT INTO TABLE oraclehadoop.dummy
>   SELECT
>   ID
> , CLUSTERED
> , SCATTERED
> , RANDOMISED
> , RANDOM_STRING
> , SMALL_VC
> , PADDING
>   FROM tmp
>   """
>HiveContext.sql(sqltext)
>   println ("\nFinished at"); sqlContext.sql("SELECT 
> FROM_unixtime(unix_timestamp(), 'dd/MM/ HH:mm:ss.ss') 
> ").collect.foreach(println)
>   sys.exit()
>  }
> }
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> 
>  
> http://talebzadehmich.wordpress.com 
>  
> 
> On 10 June 2016 at 23:38, Ajay Chander > wrote:
> Hi Mich,
> 
> Thanks for the response. If you look at my programs, I am not writings my 
> queries to include column names in a pair of "". My driver in spark program 
> is generating such query with column names in "" which

Re: SAS_TO_SPARK_SQL_(Could be a Bug?)

2016-06-11 Thread Ajay Chander

I tried implementing the same functionality through Scala as well. But no
luck so far. Just wondering if anyone here tried using Spark SQL to read
SAS dataset? Thank you

Regards,
Ajay

On Friday, June 10, 2016, Ajay Chander  wrote:

> Mich, I completely agree with you. I built another Spark SQL application
> which reads data from MySQL and SQL server and writes the data into
> Hive(parquet+snappy format). I have this problem only when I read directly
> from remote SAS system. The interesting part is I am using same driver to
> read data through pure Java app and spark app. It works fine in Java app,
> so I cannot blame SAS driver here. Trying to understand where the problem
> could be. Thanks for sharing this with me.
>
> On Friday, June 10, 2016, Mich Talebzadeh  > wrote:
>
>> I personally use Scala to do something similar. For example here I
>> extract data from an Oracle table and store in ORC table in Hive. This is
>> compiled via sbt as run with SparkSubmit.
>>
>> It is similar to your code but in Scala. Note that I do not enclose my
>> column names in double quotes.
>>
>> import org.apache.spark.SparkContext
>> import org.apache.spark.SparkConf
>> import org.apache.spark.sql.Row
>> import org.apache.spark.sql.hive.HiveContext
>> import org.apache.spark.sql.types._
>> import org.apache.spark.sql.SQLContext
>> import org.apache.spark.sql.functions._
>>
>> object ETL_scratchpad_dummy {
>>   def main(args: Array[String]) {
>>   val conf = new SparkConf().
>>setAppName("ETL_scratchpad_dummy").
>>set("spark.driver.allowMultipleContexts", "true")
>>   val sc = new SparkContext(conf)
>>   // Create sqlContext based on HiveContext
>>   val sqlContext = new HiveContext(sc)
>>   import sqlContext.implicits._
>>   val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
>>   println ("\nStarted at"); sqlContext.sql("SELECT
>> FROM_unixtime(unix_timestamp(), 'dd/MM/ HH:mm:ss.ss')
>> ").collect.foreach(println)
>>   HiveContext.sql("use oraclehadoop")
>>   var _ORACLEserver : String = "jdbc:oracle:thin:@rhes564:1521:mydb12"
>>   var _username : String = "scratchpad"
>>   var _password : String = ""
>>
>>   // Get data from Oracle table scratchpad.dummy
>>   val d = HiveContext.load("jdbc",
>>   Map("url" -> _ORACLEserver,
>>   "dbtable" -> "(SELECT to_char(ID) AS ID, to_char(CLUSTERED) AS
>> CLUSTERED, to_char(SCATTERED) AS SCATTERED, to_char(RANDOMISED) AS
>> RANDOMISED, RANDOM_STRING, SMALL_VC, PADDING FROM scratchpad.dummy)",
>>   "user" -> _username,
>>   "password" -> _password))
>>
>>d.registerTempTable("tmp")
>>   //
>>   // Need to create and populate target ORC table oraclehadoop.dummy
>>   //
>>   HiveContext.sql("use oraclehadoop")
>>   //
>>   // Drop and create table dummy
>>   //
>>   HiveContext.sql("DROP TABLE IF EXISTS oraclehadoop.dummy")
>>   var sqltext : String = ""
>>   sqltext = """
>>   CREATE TABLE oraclehadoop.dummy (
>>  ID INT
>>, CLUSTERED INT
>>, SCATTERED INT
>>, RANDOMISED INT
>>, RANDOM_STRING VARCHAR(50)
>>, SMALL_VC VARCHAR(10)
>>, PADDING  VARCHAR(10)
>>   )
>>   CLUSTERED BY (ID) INTO 256 BUCKETS
>>   STORED AS ORC
>>   TBLPROPERTIES (
>>   "orc.create.index"="true",
>>   "orc.bloom.filter.columns"="ID",
>>   "orc.bloom.filter.fpp"="0.05",
>>   "orc.compress"="SNAPPY",
>>   "orc.stripe.size"="16777216",
>>   "orc.row.index.stride"="1" )
>>   """
>>HiveContext.sql(sqltext)
>>   //
>>   // Put data in Hive table. Clean up is already done
>>   //
>>   sqltext = """
>>   INSERT INTO TABLE oraclehadoop.dummy
>>   SELECT
>>   ID
>> , CLUSTERED
>> , SCATTERED
>> , RANDOMISED
>> , RANDOM_STRING
>> , SMALL_VC
>> , PADDING
>>   FROM tmp
>>   """
>>HiveContext.sql(sqltext)
>>   println ("\nFinished at"); sqlContext.sql("SELECT
>> FROM_unixtime(unix_timestamp(), 'dd/MM/ HH:mm:ss.ss')
>> ").collect.foreach(println)
>>   sys.exit()
>>  }
>> }
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 10 June 2016 at 23:38, Ajay Chander  wrote:
>>
>>> Hi Mich,
>>>
>>> Thanks for the response. If you look at my programs, I am not writings
>>> my queries to include column names in a pair of "". My driver in spark
>>> program is generating such query with column names in "" which I do not
>>> want. On the other hand, I am using the same driver in my pure Java program
>>> which is attached, in that program the same driver is generating a proper
>>> sql query with out "".
>>>
>>> Pure Java log:
>>>
>>> 2016-06-10 10:35:21,584] INFO stmt(1.1)#executeQuery SELECT
>>> a.sr_no,a.start_dt,a.end_dt FROM

Re: SAS_TO_SPARK_SQL_(Could be a Bug?)

2016-06-10 Thread Ajay Chander

Mich, I completely agree with you. I built another Spark SQL application
which reads data from MySQL and SQL server and writes the data into
Hive(parquet+snappy format). I have this problem only when I read directly
from remote SAS system. The interesting part is I am using same driver to
read data through pure Java app and spark app. It works fine in Java app,
so I cannot blame SAS driver here. Trying to understand where the problem
could be. Thanks for sharing this with me.

On Friday, June 10, 2016, Mich Talebzadeh  wrote:

> I personally use Scala to do something similar. For example here I extract
> data from an Oracle table and store in ORC table in Hive. This is compiled
> via sbt as run with SparkSubmit.
>
> It is similar to your code but in Scala. Note that I do not enclose my
> column names in double quotes.
>
> import org.apache.spark.SparkContext
> import org.apache.spark.SparkConf
> import org.apache.spark.sql.Row
> import org.apache.spark.sql.hive.HiveContext
> import org.apache.spark.sql.types._
> import org.apache.spark.sql.SQLContext
> import org.apache.spark.sql.functions._
>
> object ETL_scratchpad_dummy {
>   def main(args: Array[String]) {
>   val conf = new SparkConf().
>setAppName("ETL_scratchpad_dummy").
>set("spark.driver.allowMultipleContexts", "true")
>   val sc = new SparkContext(conf)
>   // Create sqlContext based on HiveContext
>   val sqlContext = new HiveContext(sc)
>   import sqlContext.implicits._
>   val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
>   println ("\nStarted at"); sqlContext.sql("SELECT
> FROM_unixtime(unix_timestamp(), 'dd/MM/ HH:mm:ss.ss')
> ").collect.foreach(println)
>   HiveContext.sql("use oraclehadoop")
>   var _ORACLEserver : String = "jdbc:oracle:thin:@rhes564:1521:mydb12"
>   var _username : String = "scratchpad"
>   var _password : String = ""
>
>   // Get data from Oracle table scratchpad.dummy
>   val d = HiveContext.load("jdbc",
>   Map("url" -> _ORACLEserver,
>   "dbtable" -> "(SELECT to_char(ID) AS ID, to_char(CLUSTERED) AS
> CLUSTERED, to_char(SCATTERED) AS SCATTERED, to_char(RANDOMISED) AS
> RANDOMISED, RANDOM_STRING, SMALL_VC, PADDING FROM scratchpad.dummy)",
>   "user" -> _username,
>   "password" -> _password))
>
>d.registerTempTable("tmp")
>   //
>   // Need to create and populate target ORC table oraclehadoop.dummy
>   //
>   HiveContext.sql("use oraclehadoop")
>   //
>   // Drop and create table dummy
>   //
>   HiveContext.sql("DROP TABLE IF EXISTS oraclehadoop.dummy")
>   var sqltext : String = ""
>   sqltext = """
>   CREATE TABLE oraclehadoop.dummy (
>  ID INT
>, CLUSTERED INT
>, SCATTERED INT
>, RANDOMISED INT
>, RANDOM_STRING VARCHAR(50)
>, SMALL_VC VARCHAR(10)
>, PADDING  VARCHAR(10)
>   )
>   CLUSTERED BY (ID) INTO 256 BUCKETS
>   STORED AS ORC
>   TBLPROPERTIES (
>   "orc.create.index"="true",
>   "orc.bloom.filter.columns"="ID",
>   "orc.bloom.filter.fpp"="0.05",
>   "orc.compress"="SNAPPY",
>   "orc.stripe.size"="16777216",
>   "orc.row.index.stride"="1" )
>   """
>HiveContext.sql(sqltext)
>   //
>   // Put data in Hive table. Clean up is already done
>   //
>   sqltext = """
>   INSERT INTO TABLE oraclehadoop.dummy
>   SELECT
>   ID
> , CLUSTERED
> , SCATTERED
> , RANDOMISED
> , RANDOM_STRING
> , SMALL_VC
> , PADDING
>   FROM tmp
>   """
>HiveContext.sql(sqltext)
>   println ("\nFinished at"); sqlContext.sql("SELECT
> FROM_unixtime(unix_timestamp(), 'dd/MM/ HH:mm:ss.ss')
> ").collect.foreach(println)
>   sys.exit()
>  }
> }
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 10 June 2016 at 23:38, Ajay Chander  > wrote:
>
>> Hi Mich,
>>
>> Thanks for the response. If you look at my programs, I am not writings my
>> queries to include column names in a pair of "". My driver in spark
>> program is generating such query with column names in "" which I do not
>> want. On the other hand, I am using the same driver in my pure Java program
>> which is attached, in that program the same driver is generating a proper
>> sql query with out "".
>>
>> Pure Java log:
>>
>> 2016-06-10 10:35:21,584] INFO stmt(1.1)#executeQuery SELECT
>> a.sr_no,a.start_dt,a.end_dt FROM sasLib.run_control a; created result
>> set 1.1.1; time= 0.122 secs (com.sas.rio.MVAStatement:590)
>> Spark SQL log:
>>
>> [2016-06-10 10:29:05,834] INFO conn(2)#prepareStatement sql=SELECT
>> "SR_NO","start_dt","end_dt" FROM sasLib.run_control ; prepared statement
>> 2.1; time= 0.038 secs (com.sas.rio.MVAConnection:538)
>>
>> [2016-06-10 10:29:05,935] INFO ps(2.1)#executeQuery SELECT
>>

Re: SAS_TO_SPARK_SQL_(Could be a Bug?)

2016-06-10 Thread Mich Talebzadeh

I personally use Scala to do something similar. For example here I extract
data from an Oracle table and store in ORC table in Hive. This is compiled
via sbt as run with SparkSubmit.

It is similar to your code but in Scala. Note that I do not enclose my
column names in double quotes.

import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.sql.Row
import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.types._
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.functions._

object ETL_scratchpad_dummy {
  def main(args: Array[String]) {
  val conf = new SparkConf().
   setAppName("ETL_scratchpad_dummy").
   set("spark.driver.allowMultipleContexts", "true")
  val sc = new SparkContext(conf)
  // Create sqlContext based on HiveContext
  val sqlContext = new HiveContext(sc)
  import sqlContext.implicits._
  val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
  println ("\nStarted at"); sqlContext.sql("SELECT
FROM_unixtime(unix_timestamp(), 'dd/MM/ HH:mm:ss.ss')
").collect.foreach(println)
  HiveContext.sql("use oraclehadoop")
  var _ORACLEserver : String = "jdbc:oracle:thin:@rhes564:1521:mydb12"
  var _username : String = "scratchpad"
  var _password : String = ""

  // Get data from Oracle table scratchpad.dummy
  val d = HiveContext.load("jdbc",
  Map("url" -> _ORACLEserver,
  "dbtable" -> "(SELECT to_char(ID) AS ID, to_char(CLUSTERED) AS CLUSTERED,
to_char(SCATTERED) AS SCATTERED, to_char(RANDOMISED) AS RANDOMISED,
RANDOM_STRING, SMALL_VC, PADDING FROM scratchpad.dummy)",
  "user" -> _username,
  "password" -> _password))

   d.registerTempTable("tmp")
  //
  // Need to create and populate target ORC table oraclehadoop.dummy
  //
  HiveContext.sql("use oraclehadoop")
  //
  // Drop and create table dummy
  //
  HiveContext.sql("DROP TABLE IF EXISTS oraclehadoop.dummy")
  var sqltext : String = ""
  sqltext = """
  CREATE TABLE oraclehadoop.dummy (
 ID INT
   , CLUSTERED INT
   , SCATTERED INT
   , RANDOMISED INT
   , RANDOM_STRING VARCHAR(50)
   , SMALL_VC VARCHAR(10)
   , PADDING  VARCHAR(10)
  )
  CLUSTERED BY (ID) INTO 256 BUCKETS
  STORED AS ORC
  TBLPROPERTIES (
  "orc.create.index"="true",
  "orc.bloom.filter.columns"="ID",
  "orc.bloom.filter.fpp"="0.05",
  "orc.compress"="SNAPPY",
  "orc.stripe.size"="16777216",
  "orc.row.index.stride"="1" )
  """
   HiveContext.sql(sqltext)
  //
  // Put data in Hive table. Clean up is already done
  //
  sqltext = """
  INSERT INTO TABLE oraclehadoop.dummy
  SELECT
  ID
, CLUSTERED
, SCATTERED
, RANDOMISED
, RANDOM_STRING
, SMALL_VC
, PADDING
  FROM tmp
  """
   HiveContext.sql(sqltext)
  println ("\nFinished at"); sqlContext.sql("SELECT
FROM_unixtime(unix_timestamp(), 'dd/MM/ HH:mm:ss.ss')
").collect.foreach(println)
  sys.exit()
 }
}

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 10 June 2016 at 23:38, Ajay Chander  wrote:

> Hi Mich,
>
> Thanks for the response. If you look at my programs, I am not writings my
> queries to include column names in a pair of "". My driver in spark
> program is generating such query with column names in "" which I do not
> want. On the other hand, I am using the same driver in my pure Java program
> which is attached, in that program the same driver is generating a proper
> sql query with out "".
>
> Pure Java log:
>
> 2016-06-10 10:35:21,584] INFO stmt(1.1)#executeQuery SELECT
> a.sr_no,a.start_dt,a.end_dt FROM sasLib.run_control a; created result set
> 1.1.1; time= 0.122 secs (com.sas.rio.MVAStatement:590)
> Spark SQL log:
>
> [2016-06-10 10:29:05,834] INFO conn(2)#prepareStatement sql=SELECT
> "SR_NO","start_dt","end_dt" FROM sasLib.run_control ; prepared statement
> 2.1; time= 0.038 secs (com.sas.rio.MVAConnection:538)
>
> [2016-06-10 10:29:05,935] INFO ps(2.1)#executeQuery SELECT
> "SR_NO","start_dt","end_dt" FROM sasLib.run_control ; created result set
> 2.1.1; time= 0.102 secs (com.sas.rio.MVAStatement:590)
> Please find complete program and full logs attached in the below thread.
> Thank you.
>
> Regards,
> Ajay
>
>
> On Friday, June 10, 2016, Mich Talebzadeh 
> wrote:
>
>> Assuming I understood your query, in Spark SQL (that is you log in to
>> spark sql like  spark-sql --master spark://:7077 you do not
>> need double quotes around column names for sql to work
>>
>> spark-sql> select "hello from Mich" from oraclehadoop.sales limit 1;
>> hello from Mich
>>
>> Anything between a pair of "" will be interpreted as text NOT column name.
>>
>> In Spark SQL you do not need double quotes. So simply
>>
>> spark-sql> select prod_id, cust_id from sales limit 2;
>> 17  28017
>> 18  10419
>>
>> HTH

Re: SAS_TO_SPARK_SQL_(Could be a Bug?)

2016-06-10 Thread Ajay Chander

Hi Mich,

Thanks for the response. If you look at my programs, I am not writings my
queries to include column names in a pair of "". My driver in spark
program is generating such query with column names in "" which I do not
want. On the other hand, I am using the same driver in my pure Java program
which is attached, in that program the same driver is generating a proper
sql query with out "".

Pure Java log:

2016-06-10 10:35:21,584] INFO stmt(1.1)#executeQuery SELECT
a.sr_no,a.start_dt,a.end_dt FROM sasLib.run_control a; created result set
1.1.1; time= 0.122 secs (com.sas.rio.MVAStatement:590)
Spark SQL log:

[2016-06-10 10:29:05,834] INFO conn(2)#prepareStatement sql=SELECT
"SR_NO","start_dt","end_dt" FROM sasLib.run_control ; prepared statement
2.1; time= 0.038 secs (com.sas.rio.MVAConnection:538)

[2016-06-10 10:29:05,935] INFO ps(2.1)#executeQuery SELECT
"SR_NO","start_dt","end_dt" FROM sasLib.run_control ; created result set
2.1.1; time= 0.102 secs (com.sas.rio.MVAStatement:590)
Please find complete program and full logs attached in the below thread.
Thank you.

Regards,
Ajay

On Friday, June 10, 2016, Mich Talebzadeh  wrote:

> Assuming I understood your query, in Spark SQL (that is you log in to
> spark sql like  spark-sql --master spark://:7077 you do not
> need double quotes around column names for sql to work
>
> spark-sql> select "hello from Mich" from oraclehadoop.sales limit 1;
> hello from Mich
>
> Anything between a pair of "" will be interpreted as text NOT column name.
>
> In Spark SQL you do not need double quotes. So simply
>
> spark-sql> select prod_id, cust_id from sales limit 2;
> 17  28017
> 18  10419
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 10 June 2016 at 21:54, Ajay Chander  > wrote:
>
>> Hi again, anyone in this group tried to access SAS dataset through Spark
>> SQL ? Thank you
>>
>> Regards,
>> Ajay
>>
>>
>> On Friday, June 10, 2016, Ajay Chander > > wrote:
>>
>>> Hi Spark Users,
>>>
>>> I hope everyone here are doing great.
>>>
>>> I am trying to read data from SAS through Spark SQL and write into HDFS.
>>> Initially, I started with pure java program please find the program and
>>> logs in the attached file sas_pure_java.txt . My program ran
>>> successfully and it returned the data from Sas to Spark_SQL. Please
>>> note the highlighted part in the log.
>>>
>>> My SAS dataset has 4 rows,
>>>
>>> Program ran successfully. So my output is,
>>>
>>> [2016-06-10 10:35:21,584] INFO stmt(1.1)#executeQuery SELECT
>>> a.sr_no,a.start_dt,a.end_dt FROM sasLib.run_control a; created result
>>> set 1.1.1; time= 0.122 secs (com.sas.rio.MVAStatement:590)
>>>
>>> [2016-06-10 10:35:21,630] INFO rs(1.1.1)#next (first call to next);
>>> time= 0.045 secs (com.sas.rio.MVAResultSet:773)
>>>
>>> 1,'2016-01-01','2016-01-31'
>>>
>>> 2,'2016-02-01','2016-02-29'
>>>
>>> 3,'2016-03-01','2016-03-31'
>>>
>>> 4,'2016-04-01','2016-04-30'
>>>
>>>
>>> Please find the full logs attached to this email in file
>>> sas_pure_java.txt.
>>>
>>> ___
>>>
>>>
>>> Now I am trying to do the same via Spark SQL. Please find my program
>>> and logs attached to this email in file sas_spark_sql.txt .
>>>
>>> Connection to SAS dataset is established successfully. But please note
>>> the highlighted log below.
>>>
>>> [2016-06-10 10:29:05,834] INFO conn(2)#prepareStatement sql=SELECT
>>> "SR_NO","start_dt","end_dt" FROM sasLib.run_control ; prepared
>>> statement 2.1; time= 0.038 secs (com.sas.rio.MVAConnection:538)
>>>
>>> [2016-06-10 10:29:05,935] INFO ps(2.1)#executeQuery SELECT
>>> "SR_NO","start_dt","end_dt" FROM sasLib.run_control ; created result
>>> set 2.1.1; time= 0.102 secs (com.sas.rio.MVAStatement:590)
>>> Please find the full logs attached to this email in file
>>>  sas_spark_sql.txt
>>>
>>> I am using same driver in both pure java and spark sql programs. But the
>>> query generated in spark sql has quotes around the column names(Highlighted
>>> above).
>>> So my resulting output for that query is like this,
>>>
>>> +-++--+
>>> |  _c0| _c1|   _c2|
>>> +-++--+
>>> |SR_NO|start_dt|end_dt|
>>> |SR_NO|start_dt|end_dt|
>>> |SR_NO|start_dt|end_dt|
>>> |SR_NO|start_dt|end_dt|
>>> +-++--+
>>>
>>> Since both programs are using the same driver com.sas.rio.MVADriver .
>>> Expected output should be same as my pure java programs output. But
>>> something else is happening behind the scenes.
>>>
>>> Any insights on this issue. Thanks for your time.
>>>
>>>
>>> Regards,
>>>
>>> Ajay
>>>
>>
>

Re: SAS_TO_SPARK_SQL_(Could be a Bug?)

2016-06-10 Thread Mich Talebzadeh

Assuming I understood your query, in Spark SQL (that is you log in to spark
sql like  spark-sql --master spark://:7077 you do not need
double quotes around column names for sql to work

spark-sql> select "hello from Mich" from oraclehadoop.sales limit 1;
hello from Mich

Anything between a pair of "" will be interpreted as text NOT column name.

In Spark SQL you do not need double quotes. So simply

spark-sql> select prod_id, cust_id from sales limit 2;
17  28017
18  10419

HTH

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 10 June 2016 at 21:54, Ajay Chander  wrote:

> Hi again, anyone in this group tried to access SAS dataset through Spark
> SQL ? Thank you
>
> Regards,
> Ajay
>
>
> On Friday, June 10, 2016, Ajay Chander  wrote:
>
>> Hi Spark Users,
>>
>> I hope everyone here are doing great.
>>
>> I am trying to read data from SAS through Spark SQL and write into HDFS.
>> Initially, I started with pure java program please find the program and
>> logs in the attached file sas_pure_java.txt . My program ran
>> successfully and it returned the data from Sas to Spark_SQL. Please note
>> the highlighted part in the log.
>>
>> My SAS dataset has 4 rows,
>>
>> Program ran successfully. So my output is,
>>
>> [2016-06-10 10:35:21,584] INFO stmt(1.1)#executeQuery SELECT
>> a.sr_no,a.start_dt,a.end_dt FROM sasLib.run_control a; created result
>> set 1.1.1; time= 0.122 secs (com.sas.rio.MVAStatement:590)
>>
>> [2016-06-10 10:35:21,630] INFO rs(1.1.1)#next (first call to next); time=
>> 0.045 secs (com.sas.rio.MVAResultSet:773)
>>
>> 1,'2016-01-01','2016-01-31'
>>
>> 2,'2016-02-01','2016-02-29'
>>
>> 3,'2016-03-01','2016-03-31'
>>
>> 4,'2016-04-01','2016-04-30'
>>
>>
>> Please find the full logs attached to this email in file
>> sas_pure_java.txt.
>>
>> ___
>>
>>
>> Now I am trying to do the same via Spark SQL. Please find my program and
>> logs attached to this email in file sas_spark_sql.txt .
>>
>> Connection to SAS dataset is established successfully. But please note
>> the highlighted log below.
>>
>> [2016-06-10 10:29:05,834] INFO conn(2)#prepareStatement sql=SELECT
>> "SR_NO","start_dt","end_dt" FROM sasLib.run_control ; prepared statement
>> 2.1; time= 0.038 secs (com.sas.rio.MVAConnection:538)
>>
>> [2016-06-10 10:29:05,935] INFO ps(2.1)#executeQuery SELECT
>> "SR_NO","start_dt","end_dt" FROM sasLib.run_control ; created result set
>> 2.1.1; time= 0.102 secs (com.sas.rio.MVAStatement:590)
>> Please find the full logs attached to this email in file
>>  sas_spark_sql.txt
>>
>> I am using same driver in both pure java and spark sql programs. But the
>> query generated in spark sql has quotes around the column names(Highlighted
>> above).
>> So my resulting output for that query is like this,
>>
>> +-++--+
>> |  _c0| _c1|   _c2|
>> +-++--+
>> |SR_NO|start_dt|end_dt|
>> |SR_NO|start_dt|end_dt|
>> |SR_NO|start_dt|end_dt|
>> |SR_NO|start_dt|end_dt|
>> +-++--+
>>
>> Since both programs are using the same driver com.sas.rio.MVADriver .
>> Expected output should be same as my pure java programs output. But
>> something else is happening behind the scenes.
>>
>> Any insights on this issue. Thanks for your time.
>>
>>
>> Regards,
>>
>> Ajay
>>
>

Re: SAS_TO_SPARK_SQL_(Could be a Bug?)

2016-06-10 Thread Ajay Chander

Hi again, anyone in this group tried to access SAS dataset through Spark
SQL ? Thank you

Regards,
Ajay

On Friday, June 10, 2016, Ajay Chander  wrote:

> Hi Spark Users,
>
> I hope everyone here are doing great.
>
> I am trying to read data from SAS through Spark SQL and write into HDFS.
> Initially, I started with pure java program please find the program and
> logs in the attached file sas_pure_java.txt . My program ran successfully
> and it returned the data from Sas to Spark_SQL. Please note the
> highlighted part in the log.
>
> My SAS dataset has 4 rows,
>
> Program ran successfully. So my output is,
>
> [2016-06-10 10:35:21,584] INFO stmt(1.1)#executeQuery SELECT
> a.sr_no,a.start_dt,a.end_dt FROM sasLib.run_control a; created result set
> 1.1.1; time= 0.122 secs (com.sas.rio.MVAStatement:590)
>
> [2016-06-10 10:35:21,630] INFO rs(1.1.1)#next (first call to next); time=
> 0.045 secs (com.sas.rio.MVAResultSet:773)
>
> 1,'2016-01-01','2016-01-31'
>
> 2,'2016-02-01','2016-02-29'
>
> 3,'2016-03-01','2016-03-31'
>
> 4,'2016-04-01','2016-04-30'
>
>
> Please find the full logs attached to this email in file sas_pure_java.txt.
>
> ___
>
>
> Now I am trying to do the same via Spark SQL. Please find my program and
> logs attached to this email in file sas_spark_sql.txt .
>
> Connection to SAS dataset is established successfully. But please note
> the highlighted log below.
>
> [2016-06-10 10:29:05,834] INFO conn(2)#prepareStatement sql=SELECT
> "SR_NO","start_dt","end_dt" FROM sasLib.run_control ; prepared statement
> 2.1; time= 0.038 secs (com.sas.rio.MVAConnection:538)
>
> [2016-06-10 10:29:05,935] INFO ps(2.1)#executeQuery SELECT
> "SR_NO","start_dt","end_dt" FROM sasLib.run_control ; created result set
> 2.1.1; time= 0.102 secs (com.sas.rio.MVAStatement:590)
> Please find the full logs attached to this email in file
>  sas_spark_sql.txt
>
> I am using same driver in both pure java and spark sql programs. But the
> query generated in spark sql has quotes around the column names(Highlighted
> above).
> So my resulting output for that query is like this,
>
> +-++--+
> |  _c0| _c1|   _c2|
> +-++--+
> |SR_NO|start_dt|end_dt|
> |SR_NO|start_dt|end_dt|
> |SR_NO|start_dt|end_dt|
> |SR_NO|start_dt|end_dt|
> +-++--+
>
> Since both programs are using the same driver com.sas.rio.MVADriver .
> Expected output should be same as my pure java programs output. But
> something else is happening behind the scenes.
>
> Any insights on this issue. Thanks for your time.
>
>
> Regards,
>
> Ajay
>

Re: SAS_TO_SPARK_SQL_(Could be a Bug?)

Re: SAS_TO_SPARK_SQL_(Could be a Bug?)

Re: SAS_TO_SPARK_SQL_(Could be a Bug?)

Re: SAS_TO_SPARK_SQL_(Could be a Bug?)

Re: SAS_TO_SPARK_SQL_(Could be a Bug?)

Re: SAS_TO_SPARK_SQL_(Could be a Bug?)

Re: SAS_TO_SPARK_SQL_(Could be a Bug?)

Re: SAS_TO_SPARK_SQL_(Could be a Bug?)

8 matches

Site Navigation

Mail list logo

Footer information