Re: UDFClassLoader isolation leaking

2018-09-14 Thread Jason Gerlowski
Hi Gopal,

Thanks for taking a look, and for the workaround suggestion.

Your workaround worked for the original SerDe we encountered this
issue with.  With the stub StorageHandler mentioned above it produced
a different error (https://pastebin.com/iu0hh21C).  But I suspect
that's a problem with the StorageHandler being a bit *too* minimal.
I'll see if I can't correct that later today.

Thanks again,

Jason
On Thu, Sep 13, 2018 at 10:13 PM Gopal Vijayaraghavan  wrote:
>
> Hi,
>
> > Hopefully someone can tell me if this is a bug, expected behavior, or 
> > something I'm causing myself :)
>
> I don't think this is expected behaviour, but where the bug is what I'm 
> looking into.
>
> >  We have a custom StorageHandler that we're updating from Hive 1.2.1 to 
> > Hive 3.0.0.
>
> Most likely this bug existed in Hive 1.2 as well, but the FetchTask 
> conversion did not happen for these queries.
>
> I'll probably test out your SerDe tomorrow, but I have two target cases to 
> look into right now.
>
> The first one is that this is related to a different issue I noticed with 
> Hadoop-Common code (i.e a direct leak).
>
> https://issues.apache.org/jira/browse/HADOOP-10513
>
> The second one is that this is only broken with the Local FetchTask (which 
> gets triggered when you run "select ... limit n").
>
> > SELECT * FROM my_ext_table;
>
> So those theories, I recommend trying out
>
> set hive.fetch.task.conversion=none;
>
> and running the same query so that the old Hive1 codepaths for reading from 
> the SerDe get triggered.
>
> Cheers,
> Gopal
>
>


UDFClassLoader isolation leaking

2018-09-13 Thread Jason Gerlowski
Hi all,

Wanted to let you know of a potential bug I've run into when loading
custom jar's dynamically (i.e. "ADD JAR /path/to/jar").  Hopefully
someone can tell me if this is a bug, expected behavior, or something
I'm causing myself :)

We have a custom StorageHandler that we're updating from Hive 1.2.1 to
Hive 3.0.0.  During testing we found that under some circumstances,
queries to tables backed by our StorageHandler would return result
sets with 'NULL' in each cell.  Digging in, we found that our SerDe's
deserialize() method was returning null after a failed "instanceof"
sanity check on the input Writable.  Debugging a bit, we found that
the "instanceof" operands were the same class/package, but had been
loaded by two different UDFClassLoader instances.  This behavior seems
suspiciously like what was warned against in an early comment on
HIVE-11878 when UDFClassLoader was introduced, so I'm 99% sure it is
unintended. (see:
https://issues.apache.org/jira/browse/HIVE-11878?focusedCommentId=14876858&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14876858)

The behavior is reproducible with the following steps:

1. Find a custom StorageHandler to use.  I wrote a stub StorageHandler
here (https://github.com/gerlowskija/hive-bug-serde/) which reproduces
the issue.
2. Create a table using the StorageHandler: hive -n $hive_user -p
$hive_pass -e "ADD JAR /tmp/mycustomserde.jar; CREATE EXTERNAL TABLE
my_ext_table (hello_col STRING, world_col STRING) STORED BY
'com.helloworld.serde.HelloWorldStorageHandler' LOCATION
'/tmp/some_dir';"
3. Put some data in your external table: hive -n $hive_user -p
$hive_pass -e "ADD JAR /tmp/mycustomserde.jar; INSERT INTO
my_ext_table VALUES ('hello', 'world');"
4. Query your external table: hive -n $hive_user -p $hive_pass -e "ADD
JAR /tmp/mycustomserde.jar; SELECT * FROM my_ext_table;"

Depending on the custom serde you're using the bug might exhibit
itself differently.  But most SerDe's, which cast the "Writable" arg
to a specific Writable implementation in their deserialize method,
will print a table full of 'NULL' values.  (The provided stub
StorageHandler shows the bug this way.  It also logs the "instanceof"
operands out to hiveserver2.log, making the behavior clearer:
"Received unexpected Writable class.  Expected
com.helloworld.serde.HelloWorldWritable from classloader
org.apache.hadoop.hive.ql.exec.UDFClassLoader@489d24e9, but actually
was com.helloworld.serde.HelloWorldWritable from classloader
org.apache.hadoop.hive.ql.exec.UDFClassLoader@75517e2b").

I've written the behavior and reproduction steps up in more detail
here: https://github.com/gerlowskija/hive-bug-serde/.  Please let me
know if this is a true bug in Hive as I suspect, or if there's
something I can be doing to avoid these Classloader conflicts.

Thanks,

Jason


Re: Queries to custom serde return 'NULL' until hiveserver2 restart

2018-09-11 Thread Jason Gerlowski
Hi all,

Thanks for the suggestion Gopal.  It turns out the error occurs on
both "SELECT *" and "SELECT col" queries.  The only sort of query that
seems safe are those with aggregations or other things that cause them
to be run as mr tasks (e.g. "SELECT SUM(price_f) FROM
my_external_table").  Logging out the column names as you suggested
doesn't turn up anything unexpected either.

I've tracked the unexpected 'NULL' values down to an early exit from
my SerDe's deserialize() method.  The first thing deserialize() does
is make sure that the received Writable can be cast to the particular
type it expects (LWDocumentWritable).  In my case, this instanceof
check is failing.  The method returns 'null', which gets displayed as
NULL in HiveCLI. (code pointer here:
https://github.com/lucidworks/hive-solr/blob/master/solr-hive-core/src/main/java/com/lucidworks/hadoop/hive/LWSerDe.java#L55)

Curious about what other Writable I could've been receiving, I logged
out the class details.  The name of the class matches the class I'm
expected (and checking for with 'instanceof').  Some more logging
showed that the class definitions were identical, but that the classes
came from different UDFClassLoader's, and were thus being treated as
different classes!  I thought (partially from the UDFClassLoader
itself), that each Hive session had access to one (and only one)
UDFClassLoader.  But whatever passes the Writable to my Serde's
deserialize() passes a class object loaded by a distinct
UDFClassLoader, which my SerDe then can't recognize.

(I drew this conclusion from some logging shared here:
https://pastebin.com/TwV0HPBA)

Is it a bug that my SerDe receives input from a different class
loader? Or am I misunderstanding the lifecycle/purpose of
UDFClassLoader instances?  Is there a more robust way to cast Writable
instances in a custom SerDe implementation?  Thanks in advance for any
clarification you can give.

Best,

Jason
On Mon, Sep 10, 2018 at 10:37 PM Gopal Vijayaraghavan  wrote:
>
> >query the external table using HiveCLI (e.g. SELECT * FROM
> >my_external_table), HiveCLI prints out a table with the correct
>
> If the error is always on a "select *", then the issue might be the SerDe's 
> handling of included columns.
>
> Check what you get for
>
> colNames = 
> Arrays.asList(tblProperties.getProperty(serdeConstants.LIST_COLUMNS).split(","));
>
> Or to confirm it, try doing "Select col from table" instead of "*".
>
> Cheers,
> Gopal
>
>


Queries to custom serde return 'NULL' until hiveserver2 restart

2018-09-10 Thread Jason Gerlowski
Hi all,

Hive Version: 3.0.0
Hadoop Version: 3.1.0
Tez Version: 0.9.1

I help to maintain a Hive SerDe connecting Hive to Apache Solr.  We've
let our custom SerDe lag quite a bit behind in Hive releases (1.2.1),
but we've recently started updating the code to work with Hive 3.0.0.

After updating the code to 3.0.0, I'm seeing some odd behavior.  I'm
able to successfully create a table using our SerDe, and can INSERT
into that table (the records end up in Solr as expected).  But when I
query the external table using HiveCLI (e.g. SELECT * FROM
my_external_table), HiveCLI prints out a table with the correct
columns, and the correct number of rows, but with "NULL" for every
single square in the grid. (Example here:
https://pastebin.com/rffujvpc)  This behavior persists until I restart
HiveServer2.  After a hiveserver2 restart, HiveCLI returns the correct
data, formatted as expected.

I'm sure the problem is with our SerDe itself, but I'm not sure where
to start looking for the problem.  Can anyone point me towards any
likely culprits that might save me some debugging time, or at least
give me a place to start looking?  Is this a symptom others in the
community have stumbled on before?

Thanks for any help that anyone can provide.  (If it helps you help
me, our SerDe is openly developed, and can be found here:
https://github.com/lucidworks/hive-solr)

Best,

Jason