RE: Stored proc with spark
One thing to be aware is that you better convert Oracle NUMBER and NUMBER(m,n) columns to varchar (--> TO_CHAR()) at source as Spark will throw overflow errors. It is better to user TO_CHAR() in Oracle rather than writing UDF in Spark. UDFs in any language are slower compared to the generic functions. HTH Dr Mich Talebzadeh LinkedIn https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Technology Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Technology Ltd, its subsidiaries nor their employees accept any responsibility. From: Gaurav Agarwal [mailto:gaurav130...@gmail.com] Sent: 16 February 2016 15:52 To: Mich Talebzadeh Cc: user Subject: Re: Stored proc with spark Thanks I will try with the options On Feb 16, 2016 9:15 PM, "Mich Talebzadeh" mailto:m...@peridale.co.uk> > wrote: You can use JDBC to oracle to get that data from a given table. What Oracle stored procedure does anyway? How many tables are involved? JDBC is pretty neat. In example below I use JDBC to load two Dimension tables from Oracle in Spark shell and read the FACT table of 100 million rows from Hive val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc) println ("\nStarted at"); HiveContext.sql("SELECT FROM_unixtime(unix_timestamp(), 'dd/MM/ HH:mm:ss.ss') ").collect.foreach(println) // var _ORACLEserver : String = "jdbc:oracle:thin:@rhes564:1521:mydb" var _username : String = "sh" var _password : String = "xx" // /Get the FACT table from Hive // var s = HiveContext.sql("SELECT AMOUNT_SOLD, TIME_ID, CHANNEL_ID FROM oraclehadoop.sales") //Get Oracle tables via JDBC val c = HiveContext.load("jdbc", Map("url" -> _ORACLEserver, "dbtable" -> "(SELECT to_char(CHANNEL_ID) AS CHANNEL_ID, CHANNEL_DESC FROM sh.channels)", "user" -> _username, "password" -> _password)) val t = HiveContext.load("jdbc", Map("url" -> _ORACLEserver, "dbtable" -> "(SELECT TIME_ID AS TIME_ID, CALENDAR_MONTH_DESC FROM sh.times)", "user" -> _username, "password" -> _password)) // Registar three data frames as temporary tables using registerTempTable() call s.registerTempTable("t_s") c.registerTempTable("t_c") t.registerTempTable("t_t") // var sqltext : String = "" sqltext = """ SELECT rs.Month, rs.SalesChannel, round(TotalSales,2) FROM ( SELECT t_t.CALENDAR_MONTH_DESC AS Month, t_c.CHANNEL_DESC AS SalesChannel, SUM(t_s.AMOUNT_SOLD) AS TotalSales FROM t_s, t_t, t_c WHERE t_s.TIME_ID = t_t.TIME_ID AND t_s.CHANNEL_ID = t_c.CHANNEL_ID GROUP BY t_t.CALENDAR_MONTH_DESC, t_c.CHANNEL_DESC ORDER by t_t.CALENDAR_MONTH_DESC, t_c.CHANNEL_DESC ) rs LIMIT 10 """ HiveContext.sql(sqltext).collect.foreach(println) println ("\nFinished at"); HiveContext.sql("SELECT FROM_unixtime(unix_timestamp(), 'dd/MM/ HH:mm:ss.ss') ").collect.foreach(println) sys.exit() HTH -- Dr Mich Talebzadeh LinkedIn https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw http://talebzadehmich.wordpress.com NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Cloud Technology Partners Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Cloud Technology partners Ltd, its subsidiaries nor their employees accept any responsibility. On 16/02/2016 09:04, Gaurav Agarwal wrote: Hi Can I load the data into spark from oracle storedproc Thanks
Re: Stored proc with spark
Thanks I will try with the options On Feb 16, 2016 9:15 PM, "Mich Talebzadeh" wrote: > You can use JDBC to oracle to get that data from a given table. What > Oracle stored procedure does anyway? How many tables are involved? > > JDBC is pretty neat. In example below I use JDBC to load two > Dimension tables from Oracle in Spark shell and read the FACT table of 100 > million rows from Hive > > val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc) > > println ("\nStarted at"); HiveContext.sql("SELECT > FROM_unixtime(unix_timestamp(), 'dd/MM/ HH:mm:ss.ss') > ").collect.foreach(println) > > // > var _ORACLEserver : String = "jdbc:oracle:thin:@rhes564:1521:mydb" > var _username : String = "sh" > var _password : String = "xx" > // > > /Get the FACT table from Hive > // > var s = HiveContext.sql("SELECT AMOUNT_SOLD, TIME_ID, CHANNEL_ID FROM > oraclehadoop.sales") > > //Get Oracle tables via JDBC > > val c = HiveContext.load("jdbc", > Map("url" -> _ORACLEserver, > "dbtable" -> "(SELECT to_char(CHANNEL_ID) AS CHANNEL_ID, CHANNEL_DESC FROM > sh.channels)", > "user" -> _username, > "password" -> _password)) > > val t = HiveContext.load("jdbc", > Map("url" -> _ORACLEserver, > "dbtable" -> "(SELECT TIME_ID AS TIME_ID, CALENDAR_MONTH_DESC FROM > sh.times)", > "user" -> _username, > "password" -> _password)) > > // Registar three data frames as temporary tables using > registerTempTable() call > > s.registerTempTable("t_s") > c.registerTempTable("t_c") > t.registerTempTable("t_t") > // > var sqltext : String = "" > sqltext = """ > SELECT rs.Month, rs.SalesChannel, round(TotalSales,2) > FROM > ( > SELECT t_t.CALENDAR_MONTH_DESC AS Month, t_c.CHANNEL_DESC AS SalesChannel, > SUM(t_s.AMOUNT_SOLD) AS TotalSales > FROM t_s, t_t, t_c > WHERE t_s.TIME_ID = t_t.TIME_ID > AND t_s.CHANNEL_ID = t_c.CHANNEL_ID > GROUP BY t_t.CALENDAR_MONTH_DESC, t_c.CHANNEL_DESC > ORDER by t_t.CALENDAR_MONTH_DESC, t_c.CHANNEL_DESC > ) rs > LIMIT 10 > """ > HiveContext.sql(sqltext).collect.foreach(println) > println ("\nFinished at"); HiveContext.sql("SELECT > FROM_unixtime(unix_timestamp(), 'dd/MM/ HH:mm:ss.ss') > ").collect.foreach(println) > > sys.exit() > > > > HTH > > -- > > Dr Mich Talebzadeh > > > > LinkedIn > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > > > http://talebzadehmich.wordpress.com > > > > NOTE: The information in this email is proprietary and confidential. This > message is for the designated recipient only, if you are not the intended > recipient, you should destroy it immediately. Any information in this > message shall not be understood as given or endorsed by Cloud Technology > Partners Ltd, its subsidiaries or their employees, unless expressly so > stated. It is the responsibility of the recipient to ensure that this email > is virus free, therefore neither Cloud Technology partners Ltd, its > subsidiaries nor their employees accept any responsibility. > > On 16/02/2016 09:04, Gaurav Agarwal wrote: > > Hi > Can I load the data into spark from oracle storedproc > > Thanks > > > > > > >
Re: Stored proc with spark
You can use JDBC to oracle to get that data from a given table. What Oracle stored procedure does anyway? How many tables are involved? JDBC is pretty neat. In example below I use JDBC to load two Dimension tables from Oracle in Spark shell and read the FACT table of 100 million rows from Hive val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc) println ("\nStarted at"); HiveContext.sql("SELECT FROM_unixtime(unix_timestamp(), 'dd/MM/ HH:mm:ss.ss') ").collect.foreach(println) // var _ORACLEserver : String = "jdbc:oracle:thin:@rhes564:1521:mydb" var _username : String = "sh" var _password : String = "xx" // /Get the FACT table from Hive // var s = HiveContext.sql("SELECT AMOUNT_SOLD, TIME_ID, CHANNEL_ID FROM oraclehadoop.sales") //Get Oracle tables via JDBC val c = HiveContext.load("jdbc", Map("url" -> _ORACLEserver, "dbtable" -> "(SELECT to_char(CHANNEL_ID) AS CHANNEL_ID, CHANNEL_DESC FROM sh.channels)", "user" -> _username, "password" -> _password)) val t = HiveContext.load("jdbc", Map("url" -> _ORACLEserver, "dbtable" -> "(SELECT TIME_ID AS TIME_ID, CALENDAR_MONTH_DESC FROM sh.times)", "user" -> _username, "password" -> _password)) // Registar three data frames as temporary tables using registerTempTable() call s.registerTempTable("t_s") c.registerTempTable("t_c") t.registerTempTable("t_t") // var sqltext : String = "" sqltext = """ SELECT rs.Month, rs.SalesChannel, round(TotalSales,2) FROM ( SELECT t_t.CALENDAR_MONTH_DESC AS Month, t_c.CHANNEL_DESC AS SalesChannel, SUM(t_s.AMOUNT_SOLD) AS TotalSales FROM t_s, t_t, t_c WHERE t_s.TIME_ID = t_t.TIME_ID AND t_s.CHANNEL_ID = t_c.CHANNEL_ID GROUP BY t_t.CALENDAR_MONTH_DESC, t_c.CHANNEL_DESC ORDER by t_t.CALENDAR_MONTH_DESC, t_c.CHANNEL_DESC ) rs LIMIT 10 """ HiveContext.sql(sqltext).collect.foreach(println) println ("\nFinished at"); HiveContext.sql("SELECT FROM_unixtime(unix_timestamp(), 'dd/MM/ HH:mm:ss.ss') ").collect.foreach(println) sys.exit() HTH -- Dr Mich Talebzadeh LinkedIn https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUr V8Pw http://talebzadehmich.wordpress.com NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Cloud Technology Partners Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Cloud Technology partners Ltd, its subsidiaries nor their employees accept any responsibility. On 16/02/2016 09:04, Gaurav Agarwal wrote: Hi Can I load the data into spark from oracle storedproc Thanks
Re: Stored proc with spark
There are many facets to this topic, you could use Sqoop or the spark jdbc driver or oracle Hadoop loader or external tables in oracle that use coprocessors to stream directly to compressed csv files that are important by spark. Depends all on volumes, non-functional and functional requirements. > On 16 Feb 2016, at 10:04, Gaurav Agarwal wrote: > > Hi > Can I load the data into spark from oracle storedproc > > Thanks
Re: Stored proc with spark
relational databases? what about sqoop? https://en.wikipedia.org/wiki/Sqoop Alonso Isidoro Roman. Mis citas preferidas (de hoy) : "Si depurar es el proceso de quitar los errores de software, entonces programar debe ser el proceso de introducirlos..." - Edsger Dijkstra My favorite quotes (today): "If debugging is the process of removing software bugs, then programming must be the process of putting ..." - Edsger Dijkstra "If you pay peanuts you get monkeys" 2016-02-16 10:04 GMT+01:00 Gaurav Agarwal : > Hi > Can I load the data into spark from oracle storedproc > > Thanks >
Re: Stored proc with spark
Hi Gaurav, do you mean stored proc that returns a table? Regards, Gourav On Tue, Feb 16, 2016 at 9:04 AM, Gaurav Agarwal wrote: > Hi > Can I load the data into spark from oracle storedproc > > Thanks >
Stored proc with spark
Hi Can I load the data into spark from oracle storedproc Thanks