Hi Mohit, Thanks for your time. Please find my response below.
Did you try the same with another database? I do load the data from MySQL and SQL Server the same way(through SPARK SQL JDBC) which works perfectly alright. As a workaround you can write the select statement yourself instead of just providing the table name? Yes I did that too. It did not made any difference. Thank you, Ajay On Sunday, June 12, 2016, Mohit Jaggi <mohitja...@gmail.com> wrote: > Looks like a bug in the code generating the SQL query…why would it be > specific to SAS, I can’t guess. Did you try the same with another database? > As a workaround you can write the select statement yourself instead of just > providing the table name. > > On Jun 11, 2016, at 6:27 PM, Ajay Chander <itsche...@gmail.com> wrote: > > I tried implementing the same functionality through Scala as well. But no > luck so far. Just wondering if anyone here tried using Spark SQL to read > SAS dataset? Thank you > > Regards, > Ajay > > On Friday, June 10, 2016, Ajay Chander <itsche...@gmail.com> wrote: > >> Mich, I completely agree with you. I built another Spark SQL application >> which reads data from MySQL and SQL server and writes the data into >> Hive(parquet+snappy format). I have this problem only when I read directly >> from remote SAS system. The interesting part is I am using same driver to >> read data through pure Java app and spark app. It works fine in Java >> app, so I cannot blame SAS driver here. Trying to understand where the >> problem could be. Thanks for sharing this with me. >> >> On Friday, June 10, 2016, Mich Talebzadeh <mich.talebza...@gmail.com> >> wrote: >> >>> I personally use Scala to do something similar. For example here I >>> extract data from an Oracle table and store in ORC table in Hive. This is >>> compiled via sbt as run with SparkSubmit. >>> >>> It is similar to your code but in Scala. Note that I do not enclose my >>> column names in double quotes. >>> >>> import org.apache.spark.SparkContext >>> import org.apache.spark.SparkConf >>> import org.apache.spark.sql.Row >>> import org.apache.spark.sql.hive.HiveContext >>> import org.apache.spark.sql.types._ >>> import org.apache.spark.sql.SQLContext >>> import org.apache.spark.sql.functions._ >>> >>> object ETL_scratchpad_dummy { >>> def main(args: Array[String]) { >>> val conf = new SparkConf(). >>> setAppName("ETL_scratchpad_dummy"). >>> set("spark.driver.allowMultipleContexts", "true") >>> val sc = new SparkContext(conf) >>> // Create sqlContext based on HiveContext >>> val sqlContext = new HiveContext(sc) >>> import sqlContext.implicits._ >>> val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc) >>> println ("\nStarted at"); sqlContext.sql("SELECT >>> FROM_unixtime(unix_timestamp(), 'dd/MM/yyyy HH:mm:ss.ss') >>> ").collect.foreach(println) >>> HiveContext.sql("use oraclehadoop") >>> var _ORACLEserver : String = "jdbc:oracle:thin:@rhes564:1521:mydb12" >>> var _username : String = "scratchpad" >>> var _password : String = "xxxxxxxx" >>> >>> // Get data from Oracle table scratchpad.dummy >>> val d = HiveContext.load("jdbc", >>> Map("url" -> _ORACLEserver, >>> "dbtable" -> "(SELECT to_char(ID) AS ID, to_char(CLUSTERED) AS >>> CLUSTERED, to_char(SCATTERED) AS SCATTERED, to_char(RANDOMISED) AS >>> RANDOMISED, RANDOM_STRING, SMALL_VC, PADDING FROM scratchpad.dummy)", >>> "user" -> _username, >>> "password" -> _password)) >>> >>> d.registerTempTable("tmp") >>> // >>> // Need to create and populate target ORC table oraclehadoop.dummy >>> // >>> HiveContext.sql("use oraclehadoop") >>> // >>> // Drop and create table dummy >>> // >>> HiveContext.sql("DROP TABLE IF EXISTS oraclehadoop.dummy") >>> var sqltext : String = "" >>> sqltext = """ >>> CREATE TABLE oraclehadoop.dummy ( >>> ID INT >>> , CLUSTERED INT >>> , SCATTERED INT >>> , RANDOMISED INT >>> , RANDOM_STRING VARCHAR(50) >>> , SMALL_VC VARCHAR(10) >>> , PADDING VARCHAR(10) >>> ) >>> CLUSTERED BY (ID) INTO 256 BUCKETS >>> STORED AS ORC >>> TBLPROPERTIES ( >>> "orc.create.index"="true", >>> "orc.bloom.filter.columns"="ID", >>> "orc.bloom.filter.fpp"="0.05", >>> "orc.compress"="SNAPPY", >>> "orc.stripe.size"="16777216", >>> "orc.row.index.stride"="10000" ) >>> """ >>> HiveContext.sql(sqltext) >>> // >>> // Put data in Hive table. Clean up is already done >>> // >>> sqltext = """ >>> INSERT INTO TABLE oraclehadoop.dummy >>> SELECT >>> ID >>> , CLUSTERED >>> , SCATTERED >>> , RANDOMISED >>> , RANDOM_STRING >>> , SMALL_VC >>> , PADDING >>> FROM tmp >>> """ >>> HiveContext.sql(sqltext) >>> println ("\nFinished at"); sqlContext.sql("SELECT >>> FROM_unixtime(unix_timestamp(), 'dd/MM/yyyy HH:mm:ss.ss') >>> ").collect.foreach(println) >>> sys.exit() >>> } >>> } >>> >>> Dr Mich Talebzadeh >>> >>> >>> LinkedIn * >>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>> >>> >>> http://talebzadehmich.wordpress.com >>> >>> >>> >>> On 10 June 2016 at 23:38, Ajay Chander <itsche...@gmail.com> wrote: >>> >>>> Hi Mich, >>>> >>>> Thanks for the response. If you look at my programs, I am not writings >>>> my queries to include column names in a pair of "". My driver in spark >>>> program is generating such query with column names in "" which I do not >>>> want. On the other hand, I am using the same driver in my pure Java program >>>> which is attached, in that program the same driver is generating a proper >>>> sql query with out "". >>>> >>>> Pure Java log: >>>> >>>> 2016-06-10 10:35:21,584] INFO stmt(1.1)#executeQuery SELECT >>>> a.sr_no,a.start_dt,a.end_dt FROM sasLib.run_control a; created result >>>> set 1.1.1; time= 0.122 secs (com.sas.rio.MVAStatement:590) >>>> Spark SQL log: >>>> >>>> [2016-06-10 10:29:05,834] INFO conn(2)#prepareStatement sql=SELECT >>>> "SR_NO","start_dt","end_dt" FROM sasLib.run_control ; prepared statement >>>> 2.1; time= 0.038 secs (com.sas.rio.MVAConnection:538) >>>> >>>> [2016-06-10 10:29:05,935] INFO ps(2.1)#executeQuery SELECT >>>> "SR_NO","start_dt","end_dt" FROM sasLib.run_control ; created result set >>>> 2.1.1; time= 0.102 secs (com.sas.rio.MVAStatement:590) >>>> Please find complete program and full logs attached in the below >>>> thread. Thank you. >>>> >>>> Regards, >>>> Ajay >>>> >>>> >>>> On Friday, June 10, 2016, Mich Talebzadeh <mich.talebza...@gmail.com> >>>> wrote: >>>> >>>>> Assuming I understood your query, in Spark SQL (that is you log in to >>>>> spark sql like spark-sql --master spark://<HOST_NAME>:7077 you do >>>>> not need double quotes around column names for sql to work >>>>> >>>>> spark-sql> select "hello from Mich" from oraclehadoop.sales limit 1; >>>>> hello from Mich >>>>> >>>>> Anything between a pair of "" will be interpreted as text NOT column >>>>> name. >>>>> >>>>> In Spark SQL you do not need double quotes. So simply >>>>> >>>>> spark-sql> select prod_id, cust_id from sales limit 2; >>>>> 17 28017 >>>>> 18 10419 >>>>> >>>>> HTH >>>>> >>>>> Dr Mich Talebzadeh >>>>> >>>>> >>>>> LinkedIn * >>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>> >>>>> >>>>> http://talebzadehmich.wordpress.com >>>>> >>>>> >>>>> >>>>> On 10 June 2016 at 21:54, Ajay Chander <itsche...@gmail.com> wrote: >>>>> >>>>>> Hi again, anyone in this group tried to access SAS dataset through >>>>>> Spark SQL ? Thank you >>>>>> >>>>>> Regards, >>>>>> Ajay >>>>>> >>>>>> >>>>>> On Friday, June 10, 2016, Ajay Chander <itsche...@gmail.com> wrote: >>>>>> >>>>>>> Hi Spark Users, >>>>>>> >>>>>>> I hope everyone here are doing great. >>>>>>> >>>>>>> I am trying to read data from SAS through Spark SQL and write into >>>>>>> HDFS. Initially, I started with pure java program please find the >>>>>>> program and logs in the attached file sas_pure_java.txt . My >>>>>>> program ran successfully and it returned the data from Sas to >>>>>>> Spark_SQL. Please >>>>>>> note the highlighted part in the log. >>>>>>> >>>>>>> My SAS dataset has 4 rows, >>>>>>> >>>>>>> Program ran successfully. So my output is, >>>>>>> >>>>>>> [2016-06-10 10:35:21,584] INFO stmt(1.1)#executeQuery SELECT >>>>>>> a.sr_no,a.start_dt,a.end_dt FROM sasLib.run_control a; created >>>>>>> result set 1.1.1; time= 0.122 secs (com.sas.rio.MVAStatement:590) >>>>>>> >>>>>>> [2016-06-10 10:35:21,630] INFO rs(1.1.1)#next (first call to next); >>>>>>> time= 0.045 secs (com.sas.rio.MVAResultSet:773) >>>>>>> >>>>>>> 1,'2016-01-01','2016-01-31' >>>>>>> >>>>>>> 2,'2016-02-01','2016-02-29' >>>>>>> >>>>>>> 3,'2016-03-01','2016-03-31' >>>>>>> >>>>>>> 4,'2016-04-01','2016-04-30' >>>>>>> >>>>>>> >>>>>>> Please find the full logs attached to this email in file >>>>>>> sas_pure_java.txt. >>>>>>> >>>>>>> _______________________ >>>>>>> >>>>>>> >>>>>>> Now I am trying to do the same via Spark SQL. Please find my >>>>>>> program and logs attached to this email in file sas_spark_sql.txt . >>>>>>> >>>>>>> Connection to SAS dataset is established successfully. But please >>>>>>> note the highlighted log below. >>>>>>> >>>>>>> [2016-06-10 10:29:05,834] INFO conn(2)#prepareStatement sql=SELECT >>>>>>> "SR_NO","start_dt","end_dt" FROM sasLib.run_control ; prepared >>>>>>> statement 2.1; time= 0.038 secs (com.sas.rio.MVAConnection:538) >>>>>>> >>>>>>> [2016-06-10 10:29:05,935] INFO ps(2.1)#executeQuery SELECT >>>>>>> "SR_NO","start_dt","end_dt" FROM sasLib.run_control ; created >>>>>>> result set 2.1.1; time= 0.102 secs (com.sas.rio.MVAStatement:590) >>>>>>> Please find the full logs attached to this email in file >>>>>>> sas_spark_sql.txt >>>>>>> >>>>>>> I am using same driver in both pure java and spark sql programs. But >>>>>>> the query generated in spark sql has quotes around the column >>>>>>> names(Highlighted above). >>>>>>> So my resulting output for that query is like this, >>>>>>> >>>>>>> +-----+--------+------+ >>>>>>> | _c0| _c1| _c2| >>>>>>> +-----+--------+------+ >>>>>>> |SR_NO|start_dt|end_dt| >>>>>>> |SR_NO|start_dt|end_dt| >>>>>>> |SR_NO|start_dt|end_dt| >>>>>>> |SR_NO|start_dt|end_dt| >>>>>>> +-----+--------+------+ >>>>>>> >>>>>>> Since both programs are using the same driver com.sas.rio.MVADriver >>>>>>> . Expected output should be same as my pure java programs output. But >>>>>>> something else is happening behind the scenes. >>>>>>> >>>>>>> Any insights on this issue. Thanks for your time. >>>>>>> >>>>>>> >>>>>>> Regards, >>>>>>> >>>>>>> Ajay >>>>>>> >>>>>> >>>>> >>> >