Hi Mich, I have a below question.
I want to join two tables and return the result based on the input value. In this case, how we need to specify lower bound and upper bound values ? select t1.id, t1.name, t2.course, t2.qualification from t1, t2 where t1.transactionid=*11111* and t1.id = t2.id *11111 => dynamic input value.* Regards, Rajesh On Mon, Aug 15, 2016 at 12:05 PM, Mich Talebzadeh <mich.talebza...@gmail.com > wrote: > If you have your RDBMS table partitioned, then you need to consider how > much data you want to extract in other words the result set returned by the > JDBC call. > > If you want all the data, then the number of partitions specified in the > JDBC call should be equal to the number of partitions in your RDBMS table. > > HTH > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > On 14 August 2016 at 21:44, Ashok Kumar <ashok34...@yahoo.com> wrote: > >> Thank you very much sir. >> >> I forgot to mention that two of these Oracle tables are range >> partitioned. In that case what would be the optimum number of partitions if >> you can share? >> >> Warmest >> >> >> On Sunday, 14 August 2016, 21:37, Mich Talebzadeh < >> mich.talebza...@gmail.com> wrote: >> >> >> If you have primary keys on these tables then you can parallelise the >> process reading data. >> >> You have to be careful not to set the number of partitions too many. >> Certainly there is a balance between the number of partitions supplied to >> JDBC and the load on the network and the source DB. >> >> Assuming that your underlying table has primary key ID, then this will >> create 20 parallel processes to Oracle DB >> >> val d = HiveContext.read.format("jdbc").options( >> Map("url" -> _ORACLEserver, >> "dbtable" -> "(SELECT <COL1>, <COL2>, ....FROM <TABLE>)", >> "partitionColumn" -> "ID", >> "lowerBound" -> "1", >> "upperBound" -> "maxID", >> "numPartitions" -> "20", >> "user" -> _username, >> "password" -> _password)).load >> >> assuming your upper bound on ID is maxID >> >> >> This will open multiple connections to RDBMS, each getting a subset of >> data that you want. >> >> You need to test it to ensure that you get the numPartitions optimum and >> you don't overload any component. >> >> HTH >> >> >> Dr Mich Talebzadeh >> >> LinkedIn * >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> http://talebzadehmich.wordpress.com >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for >> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case be liable for any monetary damages >> arising from such loss, damage or destruction. >> >> >> On 14 August 2016 at 21:15, Ashok Kumar <ashok34...@yahoo.com.invalid> >> wrote: >> >> Hi, >> >> There are 4 tables ranging from 10 million to 100 million rows but they >> all have primary keys. >> >> The network is fine but our Oracle is RAC and we can only connect to a >> designated Oracle node (where we have a DQ account only). >> >> We have a limited time window of few hours to get the required data out. >> >> Thanks >> >> >> On Sunday, 14 August 2016, 21:07, Mich Talebzadeh < >> mich.talebza...@gmail.com> wrote: >> >> >> How big are your tables and is there any issue with the network between >> your Spark nodes and your Oracle DB that adds to issues? >> >> HTH >> >> Dr Mich Talebzadeh >> >> LinkedIn * https://www.linkedin.com/ profile/view?id= >> AAEAAAAWh2gBxianrbJd6zP6AcPCCd OABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> http://talebzadehmich. wordpress.com >> <http://talebzadehmich.wordpress.com/> >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for >> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case be liable for any monetary damages >> arising from such loss, damage or destruction. >> >> >> On 14 August 2016 at 20:50, Ashok Kumar <ashok34...@yahoo.com.invalid> >> wrote: >> >> Hi Gurus, >> >> I have few large tables in rdbms (ours is Oracle). We want to access >> these tables through Spark JDBC >> >> What is the quickest way of getting data into Spark Dataframe say >> multiple connections from Spark >> >> thanking you >> >> >> >> >> >> >> >> >> >