Jim's approach seems like a reasonable way to go.

Giang, can you create a JIRA for this request?  You are welcome to start
working on it if you would like to contribute this to improve CRF usability.

Frank

On Tue, Feb 23, 2016 at 3:24 PM, Jim Nasby <jim.na...@bluetreble.com> wrote:

> On 2/23/16 11:07 AM, Nguyen,Giang H wrote:
>
>> I think It could be very helpful if we write a python script in Madlib to
>> tokenize words and assign the doc_id and start_pos correspondingly and
>> store it into the database. Hence, users can save a lot more time when
>> using CRF and also enable them to conveniently run crf model on big testing
>> data.
>>
>
> Perhaps the Postgres text search stuff could be used for this (maybe
> to_tsvector())?
> --
> Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
> Experts in Analytics, Data Architecture and PostgreSQL
> Data in Trouble? Get it in Treble! http://BlueTreble.com
>

Reply via email to