Hello Thank you for your advice! I'll try to adapt it to my code.
Best, -- вт, 3 дек. 2019 г. в 17:16, Antoine Pitrou <anto...@python.org>: > > Agreed. I've opened https://issues.apache.org/jira/browse/ARROW-7302 to > track it. > > Regards > > Antoine. > > > Le 03/12/2019 à 04:55, Wes McKinney a écrit : > > An option was recently added to dictionary encode all string columns > > > > > https://github.com/apache/arrow/blob/master/cpp/src/arrow/csv/options.h#L82 > > > > I think it would be useful to be able to hard-opt-in to > > dictionary-encode a particular column (regardless of the what > > cardinality ends up being). Whatever the way to do this, it should be > > clear and well documented. A new JIRA issue may be in order. Antoine, > > what do you think? > > > > On Sun, Dec 1, 2019 at 5:32 PM ntfs hard <ntfs.h...@gmail.com> wrote: > >> > >> Hello > >> > >> I'm a newcomer and not quite sure about the library usage. I tried to > find > >> some documentation about it but failed. > >> > >> I have a dataset in CSV file where one column(let's call it colour) is a > >> string category. I'd like to get indices instead of text_lines to pass > it > >> inside algorithm. > >> I tried to set column_types in ConvertOptions in > >> {{"colour", arrow::dictionary(std::make_shared<arrow::Int32Type>(), > >> arrow::utf8()) }} but it seems to be not right api usage, a wild > run-time > >> error appears: NotImplemented: CSV conversion to > dictionary<values=string, > >> indices=int32, ordered=0> is not supported > >> Also I find a merged PR #5785 < > https://github.com/apache/arrow/pull/5785> but > >> not quite sure that's applicable for my case. > >> > >> So, my question is: can I get indices inside a category column only w/ > >> library API. And if yes, what I doing wrong. :) > >> > >> *In other word,* I'd like to something like such python pandas code: > >> df[column] = df[column].cat.codes # if str(column_data_type) == > "category" > >> > >> Thank you! >