Yes, that was the idea to cache the tables in memory as they should neatly fit. 
 The loading time is no problem as the job is not time critical. The critical 
point is the constant access to the DB2 tables, which consumes costly MIPS, and 
this I hope to replace with the cached version.  So, I'll definitely give it 
try :)



On 6. Juli 2016 um 21:59:28 MESZ, Mich Talebzadeh <mich.talebza...@gmail.com> 
wrote:Well you can try it. I have done it with Oracle, SAP Sybase IQ etc but 
need to be aware of time that JDBC connection is going to take to load data.  
Sounds like your tables are pretty small so they can be cached.  Where are you 
going to store the result set etc?  HTH    Dr Mich Talebzadeh         LinkedIn  
  
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
         http://talebzadehmich.wordpress.com   Disclaimer: Use it at your own 
risk.  Any and all responsibility for any loss, damage or destruction of data 
or any other property which may arise from relying on this email's technical 
content is explicitly disclaimed. The author will in no case be liable for any 
monetary damages arising from such loss, damage or destruction.                 
 On 6 July 2016 at 20:54, Andreas Bauer <dabuks...@gmail.com> wrote: In fact, 
yes. On 6. Juli 2016 um 21:46:34 MESZ, Mich Talebzadeh 
<mich.talebza...@gmail.com> wrote:So you want to use Spark as the query engine 
accessing DB2 tables via JDBC?   Dr Mich Talebzadeh         LinkedIn    
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
         http://talebzadehmich.wordpress.com   Disclaimer: Use it at your own 
risk.  Any and all responsibility for any loss, damage or destruction of data 
or any other property which may arise from relying on this email's technical 
content is explicitly disclaimed. The author will in no case be liable for any 
monetary damages arising from such loss, damage or destruction.                 
 On 6 July 2016 at 20:39, Andreas Bauer <dabuks...@gmail.com> wrote:The sql 
statements are embedded in a PL/1 program using DB2 running ob z/OS. Quite 
powerful, but expensive and foremost shared withother jobs in the comapny. The 
whole job takes approx. 20 minutes.  So I was thinking to use Spark and let the 
Spark job run on 10 or 20 virtual instances, which I can spawn easily, 
on-demand and almost for free using a cloud infrastructure.  On 6. Juli 2016 um 
21:29:53 MESZ, Jean Georges Perrin <j...@jgp.net> wrote:What are you doing it 
on right now?> On Jul 6, 2016, at 3:25 PM, dabuki   wrote:> > I was thinking 
about to replace a legacy batch job with Spark, but I'm not> sure if Spark is 
suited for this use case. Before I start the proof of> concept, I wanted to ask 
for opinions.> > The legacy job works as follows: A file (100k - 1 mio entries) 
is iterated.> Every row contains a (book) order with an id and for each row 
approx. 15> processing steps have to be performed that involve access to 
multiple> database tables. In total approx. 25 tables (each containing 
10k-700k> entries) have to be scanned using the book's id and the retrieved 
data is> joined together. > > As I'm new to Spark I'm not sure if I can 
leverage Spark's processing model> for this use case.> > > > > > --> View this 
message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Is-Spark-suited-for-replacing-a-batch-job-using-many-database-tables-tp27300.html>
 Sent from the Apache Spark User List mailing list archive at Nabble.com.> > 
---------------------------------------------------------------------> To 
unsubscribe e-mail: user-unsubscr...@spark.apache.org>                          

Reply via email to