Re: Hive on Spark Vs Spark SQL
So does not benefit from Project Tungsten right? On Mon, Nov 16, 2015 at 12:07 PM, Reynold Xin wrote: > It's a completely different path. > > > On Sun, Nov 15, 2015 at 10:37 PM, kiran lonikar wrote: > >> I would like to know if Hive on Spark uses or shares the execution code >> with Spark SQL or DataFrames? >> >> More specifically, does Hive on Spark benefit from the changes made to >> Spark SQL, project Tungsten? Or is it completely different execution path >> where it creates its own plan and executes on RDD? >> >> -Kiran >> >> >
Hive on Spark Vs Spark SQL
I would like to know if Hive on Spark uses or shares the execution code with Spark SQL or DataFrames? More specifically, does Hive on Spark benefit from the changes made to Spark SQL, project Tungsten? Or is it completely different execution path where it creates its own plan and executes on RDD? -Kiran
Re: Code generation for GPU
Thanks for pointing to the yarn JIRA. For now, it would be good for my talk since it brings out that hadoop and big data community is already aware of the GPUs and making effort to exploit it. Good luck for your talk. That fear is lurking in my mind too :) On 10-Sep-2015 2:08 pm, "Steve Loughran" wrote: > > > On 9 Sep 2015, at 20:18, lonikar wrote: > > > > I have seen a perf improvement of 5-10 times on expression evaluation > even > > on "ordinary" laptop GPUs. Thus, it will be a good demo along with some > > concrete proposals for vectorization. As you said, I will have to hook > up to > > a column structure and perform computation and let the existing spark > > computation also proceed and compare the performance. > > > > you might also be interested to know that there's now a YARN JIRA on > making GPU another resource you can ask for > https://issues.apache.org/jira/browse/YARN-4122 > > if implemented, it'd let you submit work into the cluster asking for GPUs, > and get allocated containers on servers with the GPU capacity you need. > This'd allow you to share GPUs with other code (including your own > containers) > > > I will focus on the slides early (7th Oct is deadline), and then continue > > the work for another 3 weeks till the summit. It still gives me enough > time > > to do considerable work. Hope your fear does not come true. > > good luck. And the fear is about my talk at apachecon on the Hadoop stack > & Kerberos > > > >
Re: Code generation for GPU
Thanks. Yes thats exactly what i would like to do: copy large amounts of data to GPU RAM, perform computation and get bulk rows back for map/filter or reduce result. It is true that non trivial operations benefit more. Even streaming data to GPU RAM and interleaving computation with data transfer works but it complicates the design and doing it in spark would be even more so. Thanks for bringing out the sorting. Its a good idea since its already isolated as you pointed out. I was looking at the terasort effort and something I always wanted to take up. But somehow thought expression would be easier to deal with in a short term. Would love to work on that after this especially because unsafe is for primitive types and suited for GPUs computation model. It would be exciting to better the terasort record too. Kiran On 10-Sep-2015 1:12 pm, "Paul Wais" wrote: > In order to get a major speedup from applying *single-pass* > map/filter/reduce > operations on an array in GPU memory, wouldn't you need to stream the > columnar data directly into GPU memory somehow? You might find in your > experiments that GPU memory allocation is a bottleneck. See e.g. John > Canny's paper here (Section 1.1 paragraph 2): > http://www.cs.berkeley.edu/~jfc/papers/13/BIDMach.pdfIf the per-item > operation is very non-trivial, though, a dramatic GPU speedup may be more > likely. > > Something related (and perhaps easier to contribute to Spark) might be a > GPU-accelerated sorter for sorting Unsafe records. Especially since that > stuff is already broken out somewhat well-- e.g. `UnsafeInMemorySorter`. > Spark appears to use (single-threaded) Timsort for sorting Unsafe records, > so I imagine a multi-thread/multi-core GPU solution could handily beat > that. > > > > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/Re-Code-generation-for-GPU-tp13954p14030.html > Sent from the Apache Spark Developers List mailing list archive at > Nabble.com. > > - > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > >