On Sat, Jul 1, 2017 at 2:24 AM, Sergi Vladykin <sergi.vlady...@gmail.com> wrote:
> In SQL indexes we may store partial strings and assume them to be in UTF-8, > I don't think this can be abstracted away. But may be this is not a big > deal if in indexes we still will use UTF-8. > Sergi, why does it matter if it is UTF8 or custom encoding? Why can't we use our own compact encoding in indexes? > > 2017-07-01 10:13 GMT+03:00 Dmitriy Setrakyan <dsetrak...@apache.org>: > > > Val, do you know how we compare strings in SQL queries? Will we be able > to > > use this encoder? > > > > Additionally, I think that the encoder is a bit too abstract. Why not go > > even further and allow users create their own ASCII table for encoding? > > > > D. > > > > On Fri, Jun 30, 2017 at 6:49 PM, Valentin Kulichenko < > > valentin.kuliche...@gmail.com> wrote: > > > > > Andrey, > > > > > > Can you elaborate more on this? What is your concern? > > > > > > -Val > > > > > > On Fri, Jun 30, 2017 at 6:17 PM Andrey Mashenkov < > > > andrey.mashen...@gmail.com> > > > wrote: > > > > > > > Val, > > > > > > > > Looks like make sense. > > > > > > > > This will not affect FullText index, as Lucene has own format for > > storing > > > > data. > > > > > > > > But.. would it be compatible with H2 indexing ? I doubt. > > > > > > > > 1 июля 2017 г. 2:27 пользователь "Valentin Kulichenko" < > > > > valentin.kuliche...@gmail.com> написал: > > > > > > > > > Folks, > > > > > > > > > > Currently binary marshaller always encodes strings in UTF-8. > However, > > > > > sometimes it can be useful to customize this. For example, if data > > > > contains > > > > > a lot of Cyrillic, Chinese or other symbols, but not so many Latin > > > > symbols, > > > > > memory is used very inefficiently. In this case it would be great > to > > > > encode > > > > > most frequently used symbols in one byte instead of two or three. > > > > > > > > > > I propose to introduce BinaryStringEncoder interface that will > > convert > > > > > strings to byte arrays and back, and make it pluggable via > > > > > BinaryConfiguration. This will allow users to plug in any encoding > > > > > algorithms based on their requirements. > > > > > > > > > > Thoughts? > > > > > > > > > > https://issues.apache.org/jira/browse/IGNITE-5655 > > > > > > > > > > -Val > > > > > > > > > > > > > > >