Hi Can you look at Apache Drill as sql engine on hive? Lohith
Sent from my Sony Xperia™ smartphone ---- Tapan Upadhyay wrote ---- Thank you everyone for guidance. Jorn our motivation is to move bulk of adhoc queries to hadoop so that we have enough bandwidth on our DB for imp batch/queries. For implementing lambda architecture is it possible to get the real time updates from Teradata of any insert/update/delete? DBlogs? Deepak should we query data from cassandra using spark? how it will be different in terms of performance if we store our data in hive tables(parquet) and query using spark? in case there is not much performance gain why add one more layer of processing Mich we plan to sync the data using sqoop hourly/EOD jobs? still not decided how frequently we would need to do that. It will be based on user requirement. In case they need real time data we need to think of an alternative? How are you doing the same for Sybase? How you sync real time? Thank you!! Regards, Tapan Upadhyay +1 973 652 8757 On Wed, May 4, 2016 at 4:33 AM, Alonso Isidoro Roman <alons...@gmail.com<mailto:alons...@gmail.com>> wrote: I agree with Deepak and i would try to save data in parquet and avro format, if you can, try to measure the performance and choose the best, it will probably be parquet, but you have to know for yourself. Alonso Isidoro Roman. Mis citas preferidas (de hoy) : "Si depurar es el proceso de quitar los errores de software, entonces programar debe ser el proceso de introducirlos..." - Edsger Dijkstra My favorite quotes (today): "If debugging is the process of removing software bugs, then programming must be the process of putting ..." - Edsger Dijkstra "If you pay peanuts you get monkeys" 2016-05-04 9:22 GMT+02:00 Jörn Franke <jornfra...@gmail.com<mailto:jornfra...@gmail.com>>: Look at lambda architecture. What is the motivation of your migration? On 04 May 2016, at 03:29, Tapan Upadhyay <tap...@gmail.com<mailto:tap...@gmail.com>> wrote: Hi, We are planning to move our adhoc queries from teradata to spark. We have huge volume of queries during the day. What is best way to go about it - 1) Read data directly from teradata db using spark jdbc 2) Import data using sqoop by EOD jobs into hive tables stored as parquet and then run queries on hive tables using spark sql or spark hive context. any other ways through which we can do it in a better/efficiently? Please guide. Regards, Tapan Information transmitted by this e-mail is proprietary to Mphasis, its associated companies and/ or its customers and is intended for use only by the individual or entity to which it is addressed, and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If you are not the intended recipient or it appears that this mail has been forwarded to you without proper authority, you are notified that any use or dissemination of this information in any manner is strictly prohibited. In such cases, please notify us immediately at mailmas...@mphasis.com and delete this mail from your records.