Hi, Impala Version: 3.1.0
I have a serialized data in Kudu which are stored as bytes in a column. This data is pure ISO-8859-1 representation of HLL object which needs to be reconstructed back to the HLL object and merge is performed within the UDA function. Impala can read these objects as string and the UDA function defined for merge is a string->string to get cardinality. The unit tests seem to run without fail on a single node even for a large number of HLLs passed. However, when running on a real cluster, it seems to die almost all the time. I have been able to isolate the problem to the Update function. IMPALA_UDF_EXPORT void HLLUpdate(FunctionContext* context, const StringVal& src, StringVal* result){ if (src.is_null || result->is_null) return; HLL* hll = reinterpret_cast<HLL*>(result->ptr); Builder *b = new Builder(14, 25); HLL temp = b->build(s, src.len); vector<char> srcBytes = BytesFromStringVal(src); // this is where the deserialization happends back from utf-8 to iso-8859-1 HLL temp = b->build(srcBytes); delete b; hll->addAll(temp); } There might be some memory corruption happening when the query is distributed across the machines. But I haven't been able to figure the root cause yet. Let me know if you need more information from my side. -- Thanks Abhinav Jha