Github user sounakr commented on the issue: https://github.com/apache/incubator-carbondata/pull/604 @jackylk : This problem will get reproduce in a multi node system where a single Load will create multiple carbon data file is by multiple tasks. I have tested the code changes in a multi node cluster with large data and output ic coming correctly. Another problem observed while testing is choosing the correct cardinality of the new segment formed due to compaction. Previously we used to copy the last segment cardinality to the new compacted segment. But with IUD features updates can run on any segments and in case the updates happens on any intermediate segments then there is high probability that cardinality of that segment will be more than the last segment. So now instead of copying the cardinality from the last segment we calculate the highest cardinility values from all the segments.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---