How big is the entire table you index?

You can use monitor tools like BigDesk to verify the resources ES is using.

It is close to impossible that just base64 encoding takes 20x longer while
indexing, maybe mapper attachment is doing other extra work.

Jörg

On Mon, Feb 23, 2015 at 9:50 AM, Jiri Pik <jiri....@jiripik.com> wrote:

>  Thank you for opening of the issue.
>
>
>
> If I indexed the column as varchar and used the default ES indexing, the
> entire table is indexed within 5 seconds. If I use the Mapper Attachments,
> it takes up to 2 minutes. I am not sure whether it’s because of the extra
> work SQL Server is doing, or the extra volume the jdbc is taking care, but
> I assume it may be because of the way the Mapper Attachments works?
>
>
>
>
>
>
>
> *From:* elasticsearch@googlegroups.com [mailto:
> elasticsearch@googlegroups.com] *On Behalf Of *joergpra...@gmail.com
> *Sent:* Monday, February 23, 2015 9:26 AM
> *To:* elasticsearch@googlegroups.com
> *Subject:* Re: FW: Indexing of HTML Column in an MS SQL Server 2014
> database
>
>
>
> 1. I opened an issue for adding optional base64 encoding on columns:
> https://github.com/jprante/elasticsearch-river-jdbc/issues/472
>
>
>
> 2. What is "initial indexing"? What do you mean by "slower"?
>
>
>
> 3. Yes, you can change the documented bulk index settings.
>
>
>
> Jörg
>
>
>
>
>
> On Mon, Feb 23, 2015 at 6:12 AM, Jiri Pik <jiri....@jiripik.com> wrote:
>
>  Apologies for everyone for sending these emails with digital signature
> which may have caused some issues:
>
>
>
> Summary for Joerg:
>
>
>
> 1.       Is there a way for the JDBC river to transform the nvarchar(MAX)
> into Base64 by itself? I can do on SQL server – see below (1) for David –
> but it’s substantially slower
>
> 2.       If not, do you recommend nvarbinary(MAX) or some other MS SQL
> Server type? And then the SELECT * from XXX would just work?
>
>
>
> Summary for David:
>
> 1.       If I convert the HTML column using select ID, cast(N'' as xml).
> value ('xs:base64Binary(xs:hexBinary(sql:column("k.Content")))',
> 'varchar(max)') as Content from (SELECT ID ,  cast( cast(Content as
> varchar(MAX )) as varbinary( MAX)) Content from KBArticles) k; the
> indexing just works but takes longer than usual – is there any performance
> setting I could use?
>
> 2.       Would it be possible for the attachment mapper to index pure txt
> file without base64?
>
>
>
>
>
>
>
>
>
>
>
>
>
> *From:* Jiri Pik
> *Sent:* Monday, February 23, 2015 6:08 AM
> *To:* elasticsearch@googlegroups.com
> *Subject:* RE: Indexing of HTML Column in an MS SQL Server 2014 database
>
>
>
> Thank you very much for your kind answer. If I encode the html file into
> Base64, and use the enclosed script, then all works just fine.
>
>
>
> So, Joerg:
>
>
>
> 1.       Is there a way for the JDBC river to transform the nvarchar(MAX)
> into Base64 by itself?
>
> 2.       If not, do you recommend nvarbinary(MAX) or some other MS SQL
> Server type? And then the SELECT * from XXX would just work?
>
>
>
> What are your thoughts?
>
>
>
> BTW I have been able to convert the nvarchar to base64 using this query
>
> select ID, cast(N'' as xml).value (
> 'xs:base64Binary(xs:hexBinary(sql:column("k.Content")))', 'varchar(max)')
> as Content from (SELECT ID ,  cast( cast(Content as varchar(MAX )) as
> varbinary( MAX)) Content from KBArticles) k;
>
>
>
>
>
> The usual river and mapper attachment work just fine but the initial
> indexing takes substantially longer. Why?
>
>
>
> 3.       Is there any performance settings I could tweak?
>
>
>
> *From:* elasticsearch@googlegroups.com [
> mailto:elasticsearch@googlegroups.com <elasticsearch@googlegroups.com>] *On
> Behalf Of *joergpra...@gmail.com
> *Sent:* Sunday, February 22, 2015 6:12 PM
> *To:* elasticsearch@googlegroups.com
> *Subject:* Re: Indexing of HTML Column in an MS SQL Server 2014 database
>
>
>
> Can you give some information about the mapper attachment setup you used
> successfully?
>
>
>
> There is no good reason why this should not be possible with JDBC river.
>
>
>
> Jörg
>
>
>
> On Sun, Feb 22, 2015 at 5:20 PM, Jiri Pik <jiri....@googlemail.com> wrote:
>
>  I need to index a HTML column (nvarchar(MAX)) in a MS SQL Server
> database. I have set up a JDBC river
> https://github.com/jprante/elasticsearch-river-jdbc and the database is
> indexed.
>
> Using
>
>   "settings":{
>
>     "analysis":{
>
>       "analyzer":{
>
>         "default":{
>
>           "type":"custom",
>
>           "tokenizer":"standard",
>
>           "filter":[ "standard", "lowercase" ],
>
>           "char_filter" : ["html_strip"]
>
>         }
>
>       }
>
>     }
>
>   }
>
> is good for searching but not for the highlighter as that returns
> sometimes trimmed unpaired html tags.
>
> I have played with the Mapper Attachments with HTML attachments and then
> the highlighter works well - all original html tags are gone - but I am
> unable to get the river push the column directly to the Mapper Attachments.
>
> Questions:
>
> 1. what is the best practice for indexing HTML columns? I am aware of the
> possibility of a manual removal of HTML tags using Agility Pack but do not
> like that as it's too much extra maintenance.
>
> 2. is there any better highlighter for html data which doesn't cut off any
> original html tags?
>
> 3. How to plug in the JDBC river to Mapper Attachments?
>
> 4. Any better ideas how to achieve my goals?
>
>
>
> Thanks!
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/f175734b-0889-40a9-96d1-d46702e56666%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/f175734b-0889-40a9-96d1-d46702e56666%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
>
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH6Ei%2B23bRKrL0Z7WkQALengfhaZeJRBq5gK1F22yxJfg%40mail.gmail.com
> <https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH6Ei%2B23bRKrL0Z7WkQALengfhaZeJRBq5gK1F22yxJfg%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/a5258a9fb35548b186333e442238331c%40Ex13DAG10-N1.dataoncloud.net
> <https://groups.google.com/d/msgid/elasticsearch/a5258a9fb35548b186333e442238331c%40Ex13DAG10-N1.dataoncloud.net?utm_medium=email&utm_source=footer>
> .
>
>
> For more options, visit https://groups.google.com/d/optout.
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFaJKN9Q5Rsu8XqLpEWafyPK_YBA7rGvMX7R-9T4Odiuw%40mail.gmail.com
> <https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFaJKN9Q5Rsu8XqLpEWafyPK_YBA7rGvMX7R-9T4Odiuw%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
>   --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/a9c9114b28384485b3f4d6290d5a2da0%40Ex13DAG10-N1.dataoncloud.net
> <https://groups.google.com/d/msgid/elasticsearch/a9c9114b28384485b3f4d6290d5a2da0%40Ex13DAG10-N1.dataoncloud.net?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHw3oba_%3DAGAnYofoeHY%3Dx5JDwdSPmRhEcPdmMkHUEQwQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to