Hi,

Impala Version: 3.1.0

I have a serialized data in Kudu which are stored as bytes in a column.
This data is pure ISO-8859-1 representation of HLL object which needs to be
reconstructed back to the HLL object and merge is performed within the UDA
function. Impala can read these objects as string and the UDA function
defined for merge is a string->string to get cardinality.

The unit tests seem to run without fail on a single node even for a large
number of HLLs passed. However, when running on a real cluster, it seems to
die almost all the time. I have been able to isolate the problem to the
Update function.


IMPALA_UDF_EXPORT
void HLLUpdate(FunctionContext* context, const StringVal& src, StringVal*
result){
        if (src.is_null || result->is_null) return;

        HLL* hll = reinterpret_cast<HLL*>(result->ptr);

        Builder *b = new Builder(14, 25);
        HLL temp = b->build(s, src.len);
        vector<char> srcBytes = BytesFromStringVal(src); // this is where
the deserialization happends back from utf-8 to iso-8859-1
        HLL temp = b->build(srcBytes);
        delete b;
        hll->addAll(temp);
}

There might be some memory corruption happening when the query is
distributed across the machines. But I haven't been able to figure the root
cause yet. Let me know if you need more information from my side.


-- 
Thanks
Abhinav Jha

Reply via email to