1. I opened an issue for adding optional base64 encoding on columns:
https://github.com/jprante/elasticsearch-river-jdbc/issues/472

2. What is "initial indexing"? What do you mean by "slower"?

3. Yes, you can change the documented bulk index settings.

Jörg


On Mon, Feb 23, 2015 at 6:12 AM, Jiri Pik <jiri....@jiripik.com> wrote:

>  Apologies for everyone for sending these emails with digital signature
> which may have caused some issues:
>
>
>
> Summary for Joerg:
>
>
>
> 1.       Is there a way for the JDBC river to transform the nvarchar(MAX)
> into Base64 by itself? I can do on SQL server – see below (1) for David –
> but it’s substantially slower
>
> 2.       If not, do you recommend nvarbinary(MAX) or some other MS SQL
> Server type? And then the SELECT * from XXX would just work?
>
>
>
> Summary for David:
>
> 1.       If I convert the HTML column using select ID, cast(N'' as xml).
> value ('xs:base64Binary(xs:hexBinary(sql:column("k.Content")))',
> 'varchar(max)') as Content from (SELECT ID ,  cast( cast(Content as
> varchar(MAX )) as varbinary( MAX)) Content from KBArticles) k; the
> indexing just works but takes longer than usual – is there any performance
> setting I could use?
>
>  2.       Would it be possible for the attachment mapper to index pure
> txt file without base64?
>
>
>
>
>
>
>
>
>
>
>
>
>
> *From:* Jiri Pik
> *Sent:* Monday, February 23, 2015 6:08 AM
> *To:* elasticsearch@googlegroups.com
> *Subject:* RE: Indexing of HTML Column in an MS SQL Server 2014 database
>
>
>
> Thank you very much for your kind answer. If I encode the html file into
> Base64, and use the enclosed script, then all works just fine.
>
>
>
> So, Joerg:
>
>
>
> 1.       Is there a way for the JDBC river to transform the nvarchar(MAX)
> into Base64 by itself?
>
> 2.       If not, do you recommend nvarbinary(MAX) or some other MS SQL
> Server type? And then the SELECT * from XXX would just work?
>
>
>
> What are your thoughts?
>
>
>
> BTW I have been able to convert the nvarchar to base64 using this query
>
> select ID, cast(N'' as xml).value (
> 'xs:base64Binary(xs:hexBinary(sql:column("k.Content")))', 'varchar(max)')
> as Content from (SELECT ID ,  cast( cast(Content as varchar(MAX )) as
> varbinary( MAX)) Content from KBArticles) k;
>
>
>
>
>
> The usual river and mapper attachment work just fine but the initial
> indexing takes substantially longer. Why?
>
>
>
> 3.       Is there any performance settings I could tweak?
>
>
>
> *From:* elasticsearch@googlegroups.com [
> mailto:elasticsearch@googlegroups.com <elasticsearch@googlegroups.com>] *On
> Behalf Of *joergpra...@gmail.com
> *Sent:* Sunday, February 22, 2015 6:12 PM
> *To:* elasticsearch@googlegroups.com
> *Subject:* Re: Indexing of HTML Column in an MS SQL Server 2014 database
>
>
>
> Can you give some information about the mapper attachment setup you used
> successfully?
>
>
>
> There is no good reason why this should not be possible with JDBC river.
>
>
>
> Jörg
>
>
>
> On Sun, Feb 22, 2015 at 5:20 PM, Jiri Pik <jiri....@googlemail.com> wrote:
>
>  I need to index a HTML column (nvarchar(MAX)) in a MS SQL Server
> database. I have set up a JDBC river
> https://github.com/jprante/elasticsearch-river-jdbc and the database is
> indexed.
>
> Using
>
>   "settings":{
>
>     "analysis":{
>
>       "analyzer":{
>
>         "default":{
>
>           "type":"custom",
>
>           "tokenizer":"standard",
>
>           "filter":[ "standard", "lowercase" ],
>
>           "char_filter" : ["html_strip"]
>
>         }
>
>       }
>
>     }
>
>   }
>
> is good for searching but not for the highlighter as that returns
> sometimes trimmed unpaired html tags.
>
> I have played with the Mapper Attachments with HTML attachments and then
> the highlighter works well - all original html tags are gone - but I am
> unable to get the river push the column directly to the Mapper Attachments.
>
> Questions:
>
> 1. what is the best practice for indexing HTML columns? I am aware of the
> possibility of a manual removal of HTML tags using Agility Pack but do not
> like that as it's too much extra maintenance.
>
> 2. is there any better highlighter for html data which doesn't cut off any
> original html tags?
>
> 3. How to plug in the JDBC river to Mapper Attachments?
>
> 4. Any better ideas how to achieve my goals?
>
>
>
> Thanks!
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/f175734b-0889-40a9-96d1-d46702e56666%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/f175734b-0889-40a9-96d1-d46702e56666%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH6Ei%2B23bRKrL0Z7WkQALengfhaZeJRBq5gK1F22yxJfg%40mail.gmail.com
> <https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH6Ei%2B23bRKrL0Z7WkQALengfhaZeJRBq5gK1F22yxJfg%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
>   --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/a5258a9fb35548b186333e442238331c%40Ex13DAG10-N1.dataoncloud.net
> <https://groups.google.com/d/msgid/elasticsearch/a5258a9fb35548b186333e442238331c%40Ex13DAG10-N1.dataoncloud.net?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFaJKN9Q5Rsu8XqLpEWafyPK_YBA7rGvMX7R-9T4Odiuw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to