hive table with large column data size

2022-01-09 Thread weoccc
Hi ,

I want to store binary data (such as images) into hive table but the binary
data column might be much larger than other columns per row.  I'm worried
about the query performance. One way I can think of is to separate binary
data storage from other columns by creating 2 hive tables and run 2
separate spark query and join them later.

Later, I found parquet has supported column split into different files as
shown here:
https://parquet.apache.org/documentation/latest/

I'm wondering if spark sql already supports that ? If so, how to use ?

Weide


how to run unit test for specific component only

2015-11-11 Thread weoccc
Hi,

I am wondering how to run unit test for specific spark component only.

mvn test -DwildcardSuites="org.apache.spark.sql.*" -Dtest=none

The above command doesn't seem to work. I'm using spark 1.5.

Thanks,

Weide


get host from rdd map

2015-10-23 Thread weoccc
in rdd map function, is there a way i can know the list of host names where
the map runs ? any code sample would be appreciated ?

thx,

Weide


Re: get host from rdd map

2015-10-23 Thread weoccc
yea,

my use cases is that i want to have some external communications where rdd
is being run in map. The external communication might be handled separately
transparent to spark.  What will be the hacky way and nonhacky way to do
that ? :)

Weide



On Fri, Oct 23, 2015 at 5:32 PM, Ted Yu <yuzhih...@gmail.com> wrote:

> Can you outline your use case a bit more ?
>
> Do you want to know all the hosts which would run the map ?
>
> Cheers
>
> On Fri, Oct 23, 2015 at 5:16 PM, weoccc <weo...@gmail.com> wrote:
>
>> in rdd map function, is there a way i can know the list of host names
>> where the map runs ? any code sample would be appreciated ?
>>
>> thx,
>>
>> Weide
>>
>>
>>
>