Jim's approach seems like a reasonable way to go. Giang, can you create a JIRA for this request? You are welcome to start working on it if you would like to contribute this to improve CRF usability.
Frank On Tue, Feb 23, 2016 at 3:24 PM, Jim Nasby <jim.na...@bluetreble.com> wrote: > On 2/23/16 11:07 AM, Nguyen,Giang H wrote: > >> I think It could be very helpful if we write a python script in Madlib to >> tokenize words and assign the doc_id and start_pos correspondingly and >> store it into the database. Hence, users can save a lot more time when >> using CRF and also enable them to conveniently run crf model on big testing >> data. >> > > Perhaps the Postgres text search stuff could be used for this (maybe > to_tsvector())? > -- > Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX > Experts in Analytics, Data Architecture and PostgreSQL > Data in Trouble? Get it in Treble! http://BlueTreble.com >