Re: Indexing on plain text data and base64 encode data in a single HTTP POST request

2013-10-26 Thread Alexandre Rafalovitch
I think Jack already answered this one: "You can use an update processor."
Buy his book, I am sure he has lots of examples. :-)

Or just start from: https://wiki.apache.org/solr/UpdateRequestProcessor
And review:
http://lucene.apache.org/solr/4_5_1/solr-core/org/apache/solr/update/processor/UpdateRequestProcessorFactory.html

   and specifically classes deriving from:
http://lucene.apache.org/solr/4_5_1/solr-core/org/apache/solr/update/processor/FieldMutatingUpdateProcessorFactory.html

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Sun, Oct 27, 2013 at 10:30 AM, neerajp  wrote:

> I can not convert base64 encoded data to text in my application as it will
> impact my core application processing. I want this task should be done at
> Solr side. Can I use Apache Tika for this at solr side ?
> But the format I am sending to Solr is XML format with some fields are in
> plain text and some are in base64 encoded(may contain pdf, doc, text, image
> etc).
>
> If I had only text fields in XML then I would have used
> XMLUpdateRequestHandler. Since the XML fields are mixed types(US-ASCII
> characters and base64 encode data) so I am confused how to proceed.
>
> Pls. share your thoughts
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Indexing-on-plain-text-data-and-base64-encode-data-in-a-single-HTTP-POST-request-tp4097905p4097930.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Indexing on plain text data and base64 encode data in a single HTTP POST request

2013-10-26 Thread neerajp
I can not convert base64 encoded data to text in my application as it will
impact my core application processing. I want this task should be done at
Solr side. Can I use Apache Tika for this at solr side ?
But the format I am sending to Solr is XML format with some fields are in
plain text and some are in base64 encoded(may contain pdf, doc, text, image
etc). 

If I had only text fields in XML then I would have used
XMLUpdateRequestHandler. Since the XML fields are mixed types(US-ASCII
characters and base64 encode data) so I am confused how to proceed.

Pls. share your thoughts



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-on-plain-text-data-and-base64-encode-data-in-a-single-HTTP-POST-request-tp4097905p4097930.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr For

2013-10-26 Thread Gora Mohanty
On 27 October 2013 07:22, Baskar Sikkayan  wrote:
>
> Hi,
>Looking for solr config for Job Site. In a job site there are 2 main
> searches.
>
> 1) Employee can search for job ( based on skill set, job location, title,
> salary )
> 2) Employer can search for employees ( based on skill set, exp, location,
>  )

Please do some basic homework, and ask more specific questions.
This should be a simple problem to start with, but can become quite
complex depending on your needs, and nobody is going to do all the
configuration work for you. There are many Solr resources available:
Books, tutorials from searching Google, the Solr documentation itself.
A good starting point could be http://wiki.apache.org/solr/

> Should i have a separate config xml for both searches?

That should not be needed.

Regards,
Gora


Solr For

2013-10-26 Thread Baskar Sikkayan
Hi,
   Looking for solr config for Job Site. In a job site there are 2 main
searches.

1) Employee can search for job ( based on skill set, job location, title,
salary )
2) Employer can search for employees ( based on skill set, exp, location,
 )

Should i have a separate config xml for both searches?

Thanks,
Baskar


Re: Solr - what's the next big thing?

2013-10-26 Thread Bill Bell
Full JSON support deep complex object indexing and search Game changer 

Bill Bell
Sent from mobile


> On Oct 26, 2013, at 1:04 PM, Otis Gospodnetic  
> wrote:
> 
> Hi,
> 
>> On Sat, Oct 26, 2013 at 5:58 AM, Saar Carmi  wrote:
>> LOL,  Jack.  I can imagine Otis saying that.
> 
> Funny indeed, but not really.
> 
>> Otis,  with these marriage,  are we going to see map reduce based queries?
> 
> Can you please describe what you mean by that?  Maybe with an example.
> 
> Thanks,
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
> 
> 
> 
>>> On Oct 25, 2013 10:03 PM, "Jack Krupansky"  wrote:
>>> 
>>> But a lot of that big yellow elephant stuff is in 4.x anyway.
>>> 
>>> (Otis: I was afraid that you were going to say that the next big thing in
>>> Solr is... Elasticsearch!)
>>> 
>>> -- Jack Krupansky
>>> 
>>> -Original Message- From: Otis Gospodnetic
>>> Sent: Friday, October 25, 2013 2:43 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Solr - what's the next big thing?
>>> 
>>> Saar,
>>> 
>>> The marriage with the big yellow elephant is a big deal. It changes the
>>> scale.
>>> 
>>> Otis
>>> Solr & ElasticSearch Support
>>> http://sematext.com/
>>> On Oct 25, 2013 5:32 AM, "Saar Carmi"  wrote:
>>> 
>>> If I am not mistaken the most impressive improvement of Solr 4.0 compared
 to previous versions was the Solr Cloud architecture.
 
 What would be the next big thing in Solr 5.0 ?
 
 Saar
>>> 


Re: Solr + SPDY

2013-10-26 Thread Vinay Pothnis
Hi Otis,

While the main goal of SPDY is to reduce page load times - i think we could
benefit from it in Solr context as well.
The transport layer is still TCP - but SPDY allows multiplexing of
requests. It also uses compression and reduces the overhead of http
headers.

An excerpt from http://webtide.intalio.com/2012/03/spdy-support-in-jetty/

"SPDY reduces roundtrips with the server, reduces the HTTP verboseness by
compressing HTTP headers, improves the utilization of the TCP connection,
multiplexes requests into a single TCP connection (instead of using a
limited number of connections, each serving only one request),"

1. For users who are using http client to communicate with Solr, for
sending updates or for searching, they could benefit from SPDY
optimizations. They could make use of Jetty Http Client and set up Solr on
Jetty to enable communication over SPDY.

2. As far as SolrCloud internode communication is concerned - I am not very
sure as to hw beneficial it would be. I brought his up because, in the Solr
Cloud context, there's a lot of inter-node chatter happening to facilitate
distributed search/distributed indexing. So - I was wondering if anyone
else gave a thought about this.

Cheers
Vinay

Some references:
http://www.chromium.org/spdy/spdy-whitepaper
http://webtide.intalio.com/2012/03/spdy-support-in-jetty/
http://www.eclipse.org/jetty/documentation/current/spdy.html


On Fri, Oct 25, 2013 at 12:22 PM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

> I'm rusty on SPDY. Can you summarize the benefits in Solr context?  Thanks.
>
> Otis
> Solr & ElasticSearch Support
> http://sematext.com/
> On Oct 25, 2013 10:46 AM, "Vinay Pothnis"  wrote:
>
> > Hello,
> >
> > Couple of questions related to using SPDY with solr.
> >
> > 1. Does anybody have experience running Solr on Jetty 9 with SPDY
> support -
> > and using Jetty Client (SPDY capable client) to talk to Solr over SPDY?
> >
> > 2. This is related to Solr - Cloud - inter node communication. This might
> > not be a user-list question - nonetheless, I was wondering if there would
> > be some way to enable the use of SPDY for inter-node communication in a
> > Solr Cloud set up. Is this something that the solr team might look at?
> >
> > Thanks
> > Vinay
> >
>


Re: Indexing on plain text data and base64 encode data in a single HTTP POST request

2013-10-26 Thread Jack Krupansky
You can use an update processor. Maybe write a JavaScript script for the 
stateless script update processor that takes a list of field names and then 
converts the base 64 encoding to normal text for those specified fields.


Or, convert base 64 to text before you send the field values to Solr.

-- Jack Krupansky

-Original Message- 
From: neerajp

Sent: Saturday, October 26, 2013 12:50 PM
To: solr-user@lucene.apache.org
Subject: Indexing on plain text data and base64 encode data in a single HTTP 
POST request


Hi,
I am using Solr for searching my email data. My application is in C++ so I a
using CURL library to POST the data to Solr for indexing. I am posting data
in XML format and some of the XML fields are in plain text and some of the
base64 encoded. I want to know what should I do so that Solr can index both
types of data (plain text as well as base64 encoded data) coming in a single
XML file.

For the reference my XML file looks like:
"INBOXsolr solr
solr HI I AM EMAIL BODY\r\n\r\nTHANKSSGkgSSBBTSBBVFRBQ0hNRU5U"

In above XML all fields are in plain US ASCII characters except
email-attachment which is base64 encoded. Attachment content type could be
pdf, doc, text file etc.

Any help is highly appreciated.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-on-plain-text-data-and-base64-encode-data-in-a-single-HTTP-POST-request-tp4097905.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Solr - what's the next big thing?

2013-10-26 Thread Otis Gospodnetic
Hi,

On Sat, Oct 26, 2013 at 5:58 AM, Saar Carmi  wrote:
> LOL,  Jack.  I can imagine Otis saying that.

Funny indeed, but not really.

> Otis,  with these marriage,  are we going to see map reduce based queries?

Can you please describe what you mean by that?  Maybe with an example.

Thanks,
Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/



> On Oct 25, 2013 10:03 PM, "Jack Krupansky"  wrote:
>
>> But a lot of that big yellow elephant stuff is in 4.x anyway.
>>
>> (Otis: I was afraid that you were going to say that the next big thing in
>> Solr is... Elasticsearch!)
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: Otis Gospodnetic
>> Sent: Friday, October 25, 2013 2:43 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Solr - what's the next big thing?
>>
>> Saar,
>>
>> The marriage with the big yellow elephant is a big deal. It changes the
>> scale.
>>
>> Otis
>> Solr & ElasticSearch Support
>> http://sematext.com/
>> On Oct 25, 2013 5:32 AM, "Saar Carmi"  wrote:
>>
>>  If I am not mistaken the most impressive improvement of Solr 4.0 compared
>>> to previous versions was the Solr Cloud architecture.
>>>
>>> What would be the next big thing in Solr 5.0 ?
>>>
>>> Saar
>>>
>>>
>>


Re: difference between apache tomcat vs Jetty

2013-10-26 Thread Scott Vanderbilt

On 10/25/2013 8:18 AM, Cassandra Targett wrote:


In terms of adding or fixing documentation, the "Installing Solr" page
(https://cwiki.apache.org/confluence/display/solr/Installing+Solr)
includes a yellow box that says:

"Solr ships with a working Jetty server, with optimized settings for
Solr, inside the example directory. It is recommended that you use the
provided Jetty server for optimal performance. If you absolutely must
use a different servlet container then continue to the next section on
how to install Solr."

So, it's stated, but maybe not in a way that makes it clear to most
users. And maybe it needs to be repeated in another section.
Suggestions?

I did find this page,
https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+Jetty,
which pretty much contradicts the previous text. I'll fix that now.

Other recommendations for where doc could be more clear are welcome.


Here are a couple of suggestions:



Under section captioned "Init script to run the Solr example", it 
describes init scripts based on whether you are running Jetty 6 or 7. 
The problem is, if you use the custom version of Jetty (start.jar) 
provided with the distribution (apparently the recommended one to use 
based on earlier ports in this thread) you don't know which version of 
Jetty this is. Of course, some users might know that running:


   java -jar /var/solr/start.jar --version

will tell them which version of Jetty start.jar is based on. This may 
not be obvious to everyone, however.


The other problem, of course, is that the latest versions of Solr (4.5.1 
in particular) are based on Jetty 8, and there is no init script 
provided for that version. If the Jetty 7 script still works for Jetty 
8, then a note to the effect may help avoid confusion.


Thanks.



Implementing OpenSearch For Solr Responses

2013-10-26 Thread Furkan KAMACI
Hi;

This question my not be directly related to Solr but I want to generate
OpenSearch standard responses from my Solr indexes. As usual I don't expose
my Solr indexes to the outside and I've implemented a custom API for
querying.

My motivation to support OpenSearch is that:
https://developers.google.com/custom-search/json-api/v1/overview

I also know that there was similar conversations about OpenSearch support
at Solr: http://search-lucene.com/?q=opensearch&fc_project=Solr

Solr does not support it right now. Did anybody implemented OpenSearch
specification at their custom API and which libraries or standards did they
use? OpenSearch documentation is pretty poor as you can see:
http://www.opensearch.org/Documentation/Frequently_asked_questions#Where_do_I_get_the_XSD_Schemas.3F
there
is a title for XSD schema but there is not at explanation.

Thanks;
Furkan KAMACI


Indexing on plain text data and base64 encode data in a single HTTP POST request

2013-10-26 Thread neerajp
Hi,
I am using Solr for searching my email data. My application is in C++ so I a
using CURL library to POST the data to Solr for indexing. I am posting data
in XML format and some of the XML fields are in plain text and some of the
base64 encoded. I want to know what should I do so that Solr can index both
types of data (plain text as well as base64 encoded data) coming in a single
XML file.

For the reference my XML file looks like:
"INBOXsolr solr
solr HI I AM EMAIL BODY\r\n\r\nTHANKSSGkgSSBBTSBBVFRBQ0hNRU5U"

In above XML all fields are in plain US ASCII characters except
email-attachment which is base64 encoded. Attachment content type could be
pdf, doc, text file etc.

Any help is highly appreciated.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-on-plain-text-data-and-base64-encode-data-in-a-single-HTTP-POST-request-tp4097905.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Help on solr more like this functionality

2013-10-26 Thread Koji Sekiguchi

Hi Suren,

(13/10/25 23:36), Suren Raju wrote:

Hi,

We are trying to solve a business problem by performing solr more like this
query. We are able to perform the more like this search. We have a specific
use case that requires different boost on different match fields. Say i do
more like this based on fields title and description of products. I wanna
provide more boost for match field *title *than the description.

Query im trying so far is

mysolrhost:8983/solr/mlt?q=id:UTF8TEST&mlt.fl=title,description&mlt.mindf=1&mlt.mintf=1

Is there any way to provide different boost for title and description?



I don't have much experience on MLT, but index time boosting might help you?

Koji
--
http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html


Re: Solr - what's the next big thing?

2013-10-26 Thread Saar Carmi
LOL,  Jack.  I can imagine Otis saying that.

Otis,  with these marriage,  are we going to see map reduce based queries?
On Oct 25, 2013 10:03 PM, "Jack Krupansky"  wrote:

> But a lot of that big yellow elephant stuff is in 4.x anyway.
>
> (Otis: I was afraid that you were going to say that the next big thing in
> Solr is... Elasticsearch!)
>
> -- Jack Krupansky
>
> -Original Message- From: Otis Gospodnetic
> Sent: Friday, October 25, 2013 2:43 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr - what's the next big thing?
>
> Saar,
>
> The marriage with the big yellow elephant is a big deal. It changes the
> scale.
>
> Otis
> Solr & ElasticSearch Support
> http://sematext.com/
> On Oct 25, 2013 5:32 AM, "Saar Carmi"  wrote:
>
>  If I am not mistaken the most impressive improvement of Solr 4.0 compared
>> to previous versions was the Solr Cloud architecture.
>>
>> What would be the next big thing in Solr 5.0 ?
>>
>> Saar
>>
>>
>


Re: Is there a way to standardize the stored values (like using synonyms for indexed values)?

2013-10-26 Thread Anshum Gupta
Perhaps your question is similar to another one on the list.

http://lucene.472066.n3.nabble.com/Normalized-data-during-indexing-td4097750.html#a4097752



On Sat, Oct 26, 2013 at 4:40 AM, Developer  wrote:

> I am trying to figure out a way to standardize the stored values using a
> file
> similar to synonyms.txt file.
>
> For ex:
>
> If I have 3 entries as below
>
> name: apple banana
> name: appleBanana
> name: applebaNana
>
> Mapping
>
> apple banana, appleBanana, applebaNana=> applebanana
>
> I want to just have one entry (overwriting the other - I will be using this
> name as unique id) to be stored in the index.
>
> Not sure if its possible to do this currently. Can someone help me out?
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Is-there-a-way-to-standardize-the-stored-values-like-using-synonyms-for-indexed-values-tp4097846.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 

Anshum Gupta
http://www.anshumgupta.net