Full text indexing with Solr

2013-09-24 Thread Chetan Mehrotra
Hi,

When Oak uses Solr then do we send the complete binary to Solr for
full text indexing or we extract the content on Oak side and send the
extracted content.

And if send the complete binary content do we send it inline or it is
first uploaded to Solr and reference to that is passed?

Chetan Mehrotra


Re: Full text indexing with Solr

2013-09-24 Thread Tommaso Teofili
Hi Chetan,

I think that currently the complete binary is sent and the on the Solr side
you have the ability to choose which field to use for indexing and
searching properties of Binary type via the
OakSolrConfiguration#getFieldNameFor(Type propertyType) [1] method.

Currently the default configuration and implementation use a Solr binary
dynamic field of type so that a binary property called propname is indexed
in a Solr field called propname_bin of type BinaryField [2], however my
plan for it is to implement some dedicated analyzers that use Apache Tika
to extract the text and index that instead (or too).

Regards,
Tommaso

[1] :
http://svn.apache.org/repos/asf/jackrabbit/oak/trunk/oak-solr-core/src/main/java/org/apache/jackrabbit/oak/plugins/index/solr/OakSolrConfiguration.java
[2] :
http://lucene.apache.org/solr/4_4_0/solr-core/org/apache/solr/schema/BinaryField.html



2013/9/24 Chetan Mehrotra 

> Hi,
>
> When Oak uses Solr then do we send the complete binary to Solr for
> full text indexing or we extract the content on Oak side and send the
> extracted content.
>
> And if send the complete binary content do we send it inline or it is
> first uploaded to Solr and reference to that is passed?
>
> Chetan Mehrotra
>


Re: Full text indexing with Solr

2013-09-26 Thread Chetan Mehrotra
Thanks for the details Tommaso. Would look at the code for
implementation details.
Chetan Mehrotra


On Wed, Sep 25, 2013 at 2:33 AM, Tommaso Teofili
 wrote:
> Hi Chetan,
>
> I think that currently the complete binary is sent and the on the Solr side
> you have the ability to choose which field to use for indexing and
> searching properties of Binary type via the
> OakSolrConfiguration#getFieldNameFor(Type propertyType) [1] method.
>
> Currently the default configuration and implementation use a Solr binary
> dynamic field of type so that a binary property called propname is indexed
> in a Solr field called propname_bin of type BinaryField [2], however my
> plan for it is to implement some dedicated analyzers that use Apache Tika
> to extract the text and index that instead (or too).
>
> Regards,
> Tommaso
>
> [1] :
> http://svn.apache.org/repos/asf/jackrabbit/oak/trunk/oak-solr-core/src/main/java/org/apache/jackrabbit/oak/plugins/index/solr/OakSolrConfiguration.java
> [2] :
> http://lucene.apache.org/solr/4_4_0/solr-core/org/apache/solr/schema/BinaryField.html
>
>
>
> 2013/9/24 Chetan Mehrotra 
>
>> Hi,
>>
>> When Oak uses Solr then do we send the complete binary to Solr for
>> full text indexing or we extract the content on Oak side and send the
>> extracted content.
>>
>> And if send the complete binary content do we send it inline or it is
>> first uploaded to Solr and reference to that is passed?
>>
>> Chetan Mehrotra
>>