[ https://issues.apache.org/jira/browse/ARROW-6308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ji Liu closed ARROW-6308. ------------------------- Resolution: Invalid > [Java] Support write interleaved dictionaries and batches in IPC stream > ----------------------------------------------------------------------- > > Key: ARROW-6308 > URL: https://issues.apache.org/jira/browse/ARROW-6308 > Project: Apache Arrow > Issue Type: Bug > Components: Java > Reporter: Ji Liu > Assignee: Ji Liu > Priority: Major > > Per discussions in the following threads, as > spec([http://arrow.apache.org/docs/format/IPC.html#streaming-format]) > described, as long as a record batch doesn't reference a dictionary they can > be interleaved. > [https://github.com/apache/arrow/pull/4960] > [https://github.com/apache/arrow/pull/5146] > Currently it’s able to parse dictionaries and batches which are interleaved > via ARROW-6040, But it’s impossible to write data in this format. > cases below should be supported: > i. have a record batch of one dictionary encoded column S > # Schema > # RecordBatch: S=[null, null, null, null] > # DictionaryBatch: ['abc', 'efg'] > # Recordbatch: S=[0, 1, 0, 1] > ii. have a record batch of two dictionary encoded column S1, S2 > # Schema > # DictionaryBatch S1: ['ab', 'cd'] > # RecordBatch: S1 = [0,1,0,1] S2 =[null, null, null,] > # DictionaryBatch S2: ['cc', 'dd'] > # RecordBatch: S1 = [0,1,0,1] S2 =[0,1,0,1] > This issue is used to record this problem, and should be done after a ML > discuss. -- This message was sent by Atlassian Jira (v8.3.4#803005)