1. I opened an issue for adding optional base64 encoding on columns: https://github.com/jprante/elasticsearch-river-jdbc/issues/472
2. What is "initial indexing"? What do you mean by "slower"? 3. Yes, you can change the documented bulk index settings. Jörg On Mon, Feb 23, 2015 at 6:12 AM, Jiri Pik <jiri....@jiripik.com> wrote: > Apologies for everyone for sending these emails with digital signature > which may have caused some issues: > > > > Summary for Joerg: > > > > 1. Is there a way for the JDBC river to transform the nvarchar(MAX) > into Base64 by itself? I can do on SQL server – see below (1) for David – > but it’s substantially slower > > 2. If not, do you recommend nvarbinary(MAX) or some other MS SQL > Server type? And then the SELECT * from XXX would just work? > > > > Summary for David: > > 1. If I convert the HTML column using select ID, cast(N'' as xml). > value ('xs:base64Binary(xs:hexBinary(sql:column("k.Content")))', > 'varchar(max)') as Content from (SELECT ID , cast( cast(Content as > varchar(MAX )) as varbinary( MAX)) Content from KBArticles) k; the > indexing just works but takes longer than usual – is there any performance > setting I could use? > > 2. Would it be possible for the attachment mapper to index pure > txt file without base64? > > > > > > > > > > > > > > *From:* Jiri Pik > *Sent:* Monday, February 23, 2015 6:08 AM > *To:* elasticsearch@googlegroups.com > *Subject:* RE: Indexing of HTML Column in an MS SQL Server 2014 database > > > > Thank you very much for your kind answer. If I encode the html file into > Base64, and use the enclosed script, then all works just fine. > > > > So, Joerg: > > > > 1. Is there a way for the JDBC river to transform the nvarchar(MAX) > into Base64 by itself? > > 2. If not, do you recommend nvarbinary(MAX) or some other MS SQL > Server type? And then the SELECT * from XXX would just work? > > > > What are your thoughts? > > > > BTW I have been able to convert the nvarchar to base64 using this query > > select ID, cast(N'' as xml).value ( > 'xs:base64Binary(xs:hexBinary(sql:column("k.Content")))', 'varchar(max)') > as Content from (SELECT ID , cast( cast(Content as varchar(MAX )) as > varbinary( MAX)) Content from KBArticles) k; > > > > > > The usual river and mapper attachment work just fine but the initial > indexing takes substantially longer. Why? > > > > 3. Is there any performance settings I could tweak? > > > > *From:* elasticsearch@googlegroups.com [ > mailto:elasticsearch@googlegroups.com <elasticsearch@googlegroups.com>] *On > Behalf Of *joergpra...@gmail.com > *Sent:* Sunday, February 22, 2015 6:12 PM > *To:* elasticsearch@googlegroups.com > *Subject:* Re: Indexing of HTML Column in an MS SQL Server 2014 database > > > > Can you give some information about the mapper attachment setup you used > successfully? > > > > There is no good reason why this should not be possible with JDBC river. > > > > Jörg > > > > On Sun, Feb 22, 2015 at 5:20 PM, Jiri Pik <jiri....@googlemail.com> wrote: > > I need to index a HTML column (nvarchar(MAX)) in a MS SQL Server > database. I have set up a JDBC river > https://github.com/jprante/elasticsearch-river-jdbc and the database is > indexed. > > Using > > "settings":{ > > "analysis":{ > > "analyzer":{ > > "default":{ > > "type":"custom", > > "tokenizer":"standard", > > "filter":[ "standard", "lowercase" ], > > "char_filter" : ["html_strip"] > > } > > } > > } > > } > > is good for searching but not for the highlighter as that returns > sometimes trimmed unpaired html tags. > > I have played with the Mapper Attachments with HTML attachments and then > the highlighter works well - all original html tags are gone - but I am > unable to get the river push the column directly to the Mapper Attachments. > > Questions: > > 1. what is the best practice for indexing HTML columns? I am aware of the > possibility of a manual removal of HTML tags using Agility Pack but do not > like that as it's too much extra maintenance. > > 2. is there any better highlighter for html data which doesn't cut off any > original html tags? > > 3. How to plug in the JDBC river to Mapper Attachments? > > 4. Any better ideas how to achieve my goals? > > > > Thanks! > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to elasticsearch+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/f175734b-0889-40a9-96d1-d46702e56666%40googlegroups.com > <https://groups.google.com/d/msgid/elasticsearch/f175734b-0889-40a9-96d1-d46702e56666%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > > > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to elasticsearch+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH6Ei%2B23bRKrL0Z7WkQALengfhaZeJRBq5gK1F22yxJfg%40mail.gmail.com > <https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH6Ei%2B23bRKrL0Z7WkQALengfhaZeJRBq5gK1F22yxJfg%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to elasticsearch+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/a5258a9fb35548b186333e442238331c%40Ex13DAG10-N1.dataoncloud.net > <https://groups.google.com/d/msgid/elasticsearch/a5258a9fb35548b186333e442238331c%40Ex13DAG10-N1.dataoncloud.net?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFaJKN9Q5Rsu8XqLpEWafyPK_YBA7rGvMX7R-9T4Odiuw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.