hive table with large column data size
Hi , I want to store binary data (such as images) into hive table but the binary data column might be much larger than other columns per row. I'm worried about the query performance. One way I can think of is to separate binary data storage from other columns by creating 2 hive tables and run 2 separate spark query and join them later. Later, I found parquet has supported column split into different files as shown here: https://parquet.apache.org/documentation/latest/ I'm wondering if spark sql already supports that ? If so, how to use ? Weide
how to run unit test for specific component only
Hi, I am wondering how to run unit test for specific spark component only. mvn test -DwildcardSuites="org.apache.spark.sql.*" -Dtest=none The above command doesn't seem to work. I'm using spark 1.5. Thanks, Weide
get host from rdd map
in rdd map function, is there a way i can know the list of host names where the map runs ? any code sample would be appreciated ? thx, Weide
Re: get host from rdd map
yea, my use cases is that i want to have some external communications where rdd is being run in map. The external communication might be handled separately transparent to spark. What will be the hacky way and nonhacky way to do that ? :) Weide On Fri, Oct 23, 2015 at 5:32 PM, Ted Yu <yuzhih...@gmail.com> wrote: > Can you outline your use case a bit more ? > > Do you want to know all the hosts which would run the map ? > > Cheers > > On Fri, Oct 23, 2015 at 5:16 PM, weoccc <weo...@gmail.com> wrote: > >> in rdd map function, is there a way i can know the list of host names >> where the map runs ? any code sample would be appreciated ? >> >> thx, >> >> Weide >> >> >> >