Re: Updating data

2013-02-05 Thread Dikchant Sahi
If I understand it right, you want the json to only the new fields and not
the field that has already been indexed/stored.

Check out Solr Atomic updates. Below are some links which might help.
http://wiki.apache.org/solr/Atomic_Updates
http://yonik.com/solr/atomic-updates/

Remember, it requires the fields to be stored.

On Tue, Feb 5, 2013 at 12:35 PM, anurag.jain  wrote:

> i already indexing 180 data in solr index. all files were in json
> format.
>
> so data was like -
>
> [
> {
> "id":1,
> "first_name":"anurag",
> "last_name":"jain",
> ...
> },
>
> {
> "id":2,
> "first_name":"abhishek",
> "last_name":"jain",
> ...
> }, ...
> ]
>
>
> now i have to add a field in data like
>
>
>
> [
> {
> "id":1,
> "first_name":"anurag",
> "last_name":"jain",
> "new_field":"xvz",
> ...
> },
>
> {
> "id":2,
> "first_name":"abhishek",
> "last_name":"jain",
> "new_field":"xvz",
> ...
> }, ...
> ]
>
>
> but i want that :
> my json file like that
> [
> {
> "id":1,
> "new_field":"xvz"
> },
> {
> "id":2,
> "new_field":"xvz"
> }
> ]
>
> so it automatically update in solr like this file is doing.
>
> [
> {
> "id":1,
> "first_name":"anurag",
> "last_name":"jain",
> "new_field":"xvz",
> ...
> },
>
> {
> "id":2,
> "first_name":"abhishek",
> "last_name":"jain",
> "new_field":"xvz",
> ...
> }, ...
> ]
>
>
>
> any solutions ? please reply
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Updating-data-tp4038492.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Tokenized keywords

2013-01-21 Thread Dikchant Sahi
Tokenizer changes the behavior of how you search/index and not how you
store. What i understand is you want to display tokenized result always and
not just for debug purpose.

debugQuery has performance implications that should not be used for what
you are trying to achieve.

Basically, what you need is a way to store filtered and lowercased tokens
in the 'modified' field. What I see as a solution is either
you ingest 'original' field with your desired tokens directly instead of
using copyfield or write some custom code to store/index only the filtered
and lowercased result eg. custom transformer can be explored if you are
using data import handler.


On Mon, Jan 21, 2013 at 1:47 PM, Romita Saha
wrote:

> Hi,
>
> I have a field defined in scheme.xml named as 'original'. I first copy
> this field to "modified" and apply filters on this field "modified."
>
> field name="original" type="string" indexed="true" stored="true"/>
> field name="modified" type="text_general" indexed="true" stored="true"/>
>
>  
>
> I want to display in my as follows:
>
> original: Search for all the Laptops
> modified: search laptop
>
> Thanks and regards,
> Romita Saha
>
> Panasonic R&D Center Singapore
> Blk 1022 Tai Seng Avenue #06-3530
> Tai Seng Ind. Est. Singapore 534415
> DID: (65) 6550 5383 FAX: (65) 6550 5459
> email: romita.s...@sg.panasonic.com
>
>
>
> From:   Mikhail Khludnev 
> To: solr-user@lucene.apache.org,
> Date:   01/21/2013 03:48 PM
> Subject:Re: Tokenized keywords
>
>
>
> Romita,
> That's what exactly is shown debugQuery output. If you cant find it there,
> paste output here, let's try to find together. Also pay attention to
> explainOther debug parameter and analisys page in admin ui.
> 21.01.2013 10:50 пользователь "Romita Saha" 
> написал:
>
> > What I am trying to achieve is as follows.
> >
> > I query "Search for all the Laptops" and my tokenized key words are
> > "search laptop" (I apply stopword filter to filter out words like
> > for,all,the and i also user lowercase filter).
> > I want to display these tokenized keywords using debugQuery.
> >
> > Thanks and regards,
> > Romita
> >
> >
> >
> > From:   Dikchant Sahi 
> > To: solr-user@lucene.apache.org,
> > Date:   01/21/2013 02:26 PM
> > Subject:Re: Tokenized keywords
> >
> >
> >
> > Can you please elaborate a more on what you are trying to achieve.
> >
> > Tokenizers work on indexed field and doesn't effect how the values will
> be
> > displayed. The response value comes from stored field. If you want to
> see
> > how your query is being tokenized, you can do it using analysis
> interface
> > or enable debugQuery to see how your query is being formed.
> >
> >
> > On Mon, Jan 21, 2013 at 11:06 AM, Romita Saha
> > wrote
> >
> > > Hi,
> > >
> > > I use some tokenizers to tokenize the query. I want to see the
> tokenized
> > > query words displayed in the .Could you kindly help me do
> > that.
> > >
> > > Thanks and regards,
> > > Romita
> >
> >
>
>
>


Re: Tokenized keywords

2013-01-20 Thread Dikchant Sahi
Can you please elaborate a more on what you are trying to achieve.

Tokenizers work on indexed field and doesn't effect how the values will be
displayed. The response value comes from stored field. If you want to see
how your query is being tokenized, you can do it using analysis interface
or enable debugQuery to see how your query is being formed.


On Mon, Jan 21, 2013 at 11:06 AM, Romita Saha
wrote

> Hi,
>
> I use some tokenizers to tokenize the query. I want to see the tokenized
> query words displayed in the .Could you kindly help me do that.
>
> Thanks and regards,
> Romita


Re: MultiValue

2013-01-17 Thread Dikchant Sahi
You mean to say that the problem is with json which is being ingested.

What you are trying to achieve is that you want to split the values on the
basis of comma and index it as multiple value.

What problem you are facing in indexing json in format Solr expects. If you
don't have control over it, probably you can try playing with custom
processors.




On Fri, Jan 18, 2013 at 12:31 AM, anurag.jain  wrote:

>   [ { "last_name" : "jain", "training_skill":["c", "c++", "php,java,.net"]
> }
> ]
>
> actually i want to tokenize in   c c++ php java .net
>
>
> so through this i can make them as facet.
>
>
> but problem is in list
> "training_skill":["c", "c++", *"php,java,.net"*]
>
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/MultiValue-tp4034305p4034316.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: MultiValue

2013-01-17 Thread Dikchant Sahi
you just need to make the field as multivalued.




type should be set based on your search requirements.

On Thu, Jan 17, 2013 at 11:27 PM, anurag.jain  wrote:

> my json file look like
>
> [ { "last_name" : "jain", "training_skill":["c", "c++", "php,java,.net"] }]
>
> can u please suggest me how should i declare field in schema for
> "trainingskill" field
>
>
>
> please reply
>
> urgent
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/MultiValue-tp4034305.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Solr 3.6.2 or 4.0

2013-01-04 Thread Dikchant Sahi
As someone in the forum correctly said, if all Solr releases were
evolutionary Solr 4.0 is revolutionary. It has lots of improvement over the
previous releases like NoSql features, atomic updates, cloud features and
lot more.

Solr 4.0 would be the right migration I believe.

Can someone in the forum provide a reason to migrate to 3.6.2 and not 4.0

On Fri, Jan 4, 2013 at 5:16 PM, vijeshnair  wrote:

> We are starting a new e-com application from this month onwards, for which
> I
> am trying to identify the right SOLR release. We were using 3.4 in our
> previous project, bu I have read in multiple blogs and forums about the
> improvements that SOLR 4 has in terms of efficient memory management, less
> OOMs etc. So my question would be, can I start using SOLR 4 for my new
> project ? Why is it that Apache keeping both 3.6.2 and 4.0 releases in the
> downloads? Are there any major changes in 4.0 comparing to 3.x, so that I
> should study those changes before getting in to 4.0 ?  Please help, so that
> I can propose 4.0 to my team.
>
> Thanks
> Vijesh Nair
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-3-6-2-or-4-0-tp4030527.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Solr atomic update of multi-valued field

2012-12-19 Thread Dikchant Sahi
Hi Erick,

The name field is stored. I experience problem only when I update
multiValued field with multiple values like,
* solr*
* lucene*
*
*
It works perfect, when I set single value for multiValued field like,
*solr*

Thanks,
Dikchant

On Wed, Dec 19, 2012 at 6:25 PM, Erick Erickson wrote:

> FIrst question: Is the "name" field stored (stored="true")? If it isn't,
> that would explain your problems with that field. _all_ relevant fields
> (i.e. everything not a destination of a copyField) need to be stored for
> atomic updates to work.
>
> Your second problem I'm not sure about. I remember some JIRAs about
> multivalued fields and atomic updates, you might get some info from the
> JIRAs here: https://issues.apache.org/jira/browse/SOLR
>
> but updating multiValued fields _should_ work...
>
> Best
> Erick
>
>
> On Tue, Dec 18, 2012 at 2:20 AM, Dikchant Sahi  >wrote:
>
> > Hi,
> >
> > Does Solr 4.0 allows to update the values of multi-valued field? Say I
> have
> > list of values for skills field like java, j2ee and i want to change it
> > to solr, lucene.
> >
> > I was trying to play with atomic updates and below is my observation:
> >
> > I have following document in my index:
> > 
> > 1
> > Dikchant
> > software engineer
> > 
> > java
> > j2ee
> > 
> > 
> >
> > To update the skills to solr, lucene, I indexed document as follows:
> >
> > **
> > *  *
> > *1*
> > *solr*
> > *lucene*
> > *  *
> > **
> >
> > The document added to index is as follows:
> > **
> > *  1*
> > *  *
> > *{set=solr}*
> > *{set=lucene}*
> > *  *
> > **
> >
> > This is not what I was looking for. I found 2 issues:
> > 1. The value of name field was lost
> > 2. The skills fields had some junks like *{set=solr}*
> > *
> > *
> > *
> > *
> > Then, to achieve my goal, I tried something different. I tried setting
> some
> > single valued field with update="set" parameter to the same value and
> also
> > provided the values of multi-valued field as we do while adding new
> > document.
> > 
> >   
> > 1
> > *Dikchant*
> > solr
> > lucene
> >   
> > 
> >
> > With this the index looks as follows:
> > 
> > 1
> > Dikchant
> > software engineer
> > 
> > solr
> > lucene
> > 
> > 
> >
> > The values of multivalued field is changed and value of other field is
> not
> > deleted.
> >
> > The question that comes to my mind is, does Solr 4.0 allows update of
> > multi-valued field? if yes, is this how it works or am I doing something
> > wrong?
> >
> > Regards,
> > Dikchant
> >
>


Solr atomic update of multi-valued field

2012-12-17 Thread Dikchant Sahi
Hi,

Does Solr 4.0 allows to update the values of multi-valued field? Say I have
list of values for skills field like java, j2ee and i want to change it
to solr, lucene.

I was trying to play with atomic updates and below is my observation:

I have following document in my index:

1
Dikchant
software engineer

java
j2ee



To update the skills to solr, lucene, I indexed document as follows:

**
*  *
*1*
*solr*
*lucene*
*  *
**

The document added to index is as follows:
**
*  1*
*  *
*{set=solr}*
*{set=lucene}*
*  *
**

This is not what I was looking for. I found 2 issues:
1. The value of name field was lost
2. The skills fields had some junks like *{set=solr}*
*
*
*
*
Then, to achieve my goal, I tried something different. I tried setting some
single valued field with update="set" parameter to the same value and also
provided the values of multi-valued field as we do while adding new
document.

  
1
*Dikchant*
solr
lucene
  


With this the index looks as follows:

1
Dikchant
software engineer

solr
lucene



The values of multivalued field is changed and value of other field is not
deleted.

The question that comes to my mind is, does Solr 4.0 allows update of
multi-valued field? if yes, is this how it works or am I doing something
wrong?

Regards,
Dikchant


Re: Update / replication of offline indexes

2012-12-17 Thread Dikchant Sahi
Thanks Erick and Upayavira! This answers my question.


On Mon, Dec 17, 2012 at 8:05 AM, Erick Erickson wrote:

> See the very last line here:
> http://wiki.apache.org/solr/MergingSolrIndexes
>
> Short answer is that merging will lead to duplicate documents, even with
> uniqueKeys defined.
>
> So you're really kind of stuck handling this outside of merge, either by
> shipping the
> list of overwritten docs and deleting them from the base index or shipping
> the JSON/XML
> format and indexing those. Of the  two, I'd think the latter is
> easiest/least prone to surprises.
> Especially since you could re-run the indexing as many times as necessary.
>
> The UniqueKey bits are only guaranteed to overwrite older docs when
> indexing, not merging.
>
> Best
> Erick
>
>
> On Thu, Dec 13, 2012 at 3:17 PM, Dikchant Sahi  >wrote:
>
> > Hi Alex,
> >
> > You got my point right. What I see is merge adds duplicate document. Is
> > there a way to overwrite existing document in one core by another. Can
> > merge operation lead to data corruption, say in case when the core on
> > client had uncommitted changes.
> >
> > What would be a better solution for my requirement, merge or indexing
> > XML/JSON?
> >
> > Regards,
> > Dikchant
> >
> > On Thu, Dec 13, 2012 at 6:39 PM, Alexandre Rafalovitch
> > wrote:
> >
> > > Not sure I fully understood this and maybe you already cover that by
> > > 'merge', but if you know what you gave the client last time, you can
> just
> > > build a differential as a second core, then on client mount that second
> > > core and merge it into the first one (e.g. with DIH).
> > >
> > > Just a thought.
> > >
> > > Regards,
> > >Alex.
> > >
> > > Personal blog: http://blog.outerthoughts.com/
> > > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> > > - Time is the quality of nature that keeps events from happening all at
> > > once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)
> > >
> > >
> > >
> > > On Thu, Dec 13, 2012 at 5:28 PM, Dikchant Sahi  > > >wrote:
> > >
> > > > Hi Erick,
> > > >
> > > > Sorry for creating the confusion. By slave, I mean the indexes on
> > client
> > > > machine will be replica of the master and in not same as the slave in
> > > > master-slave model. Below is the detail:
> > > >
> > > > The system is being developed to support search facility on 1000s of
> > > > system, a majority of which will be offline.
> > > >
> > > > The idea is that we will have a search system which will be sold
> > > > on subscription basis. For each of the subscriber, we will copy the
> > > master
> > > > index to their local machine, over a drive or CD. Now, if a
> subscriber
> > > > comes after 2 months and want the updates, we just want to provide
> the
> > > > deltas for 2 month as the volume of data is huge. For this we can
> think
> > > of
> > > > two approaches:
> > > > 1. Fetch the documents which are less than 2 months old  in JSON
> format
> > > > from master Solr. Copy it to the subscriber machine
> > > > and index those documents. (copy through cd / memory sticks)
> > > > 2. Create separate indexes for each month on our master machine. Copy
> > the
> > > > indexes to the client machine and merge. Prior to merge we need to
> > delete
> > > > records which the new index has, to avoid duplicates.
> > > >
> > > > As long as the setup is new, we will copy the complete index and
> > restart
> > > > Solr. We are not sure of the best approach for copying the deltas.
> > > >
> > > > Thanks,
> > > > Dikchant
> > > >
> > > >
> > > >
> > > > On Thu, Dec 13, 2012 at 3:52 AM, Erick Erickson <
> > erickerick...@gmail.com
> > > > >wrote:
> > > >
> > > > > This is somewhat confusing. You say that box2 is the slave, yet
> > they're
> > > > not
> > > > > connected? Then you need to copy the /data index from
> box
> > 1
> > > to
> > > > > box 2 manually (I'd have box2 solr shut down at the time) and
> restart
> > > > Solr.
> > > > >
> > > > > Why can't the boxes be connected? That's a much simpler way 

Re: Update / replication of offline indexes

2012-12-13 Thread Dikchant Sahi
Yes, we have an uniqueId defined but merge adds two documents with the same
id. As per my understanding this is how Solr behaves. Correct me if am
wrong.

On Fri, Dec 14, 2012 at 2:25 AM, Alexandre Rafalovitch
wrote:

> Do you have IDs defined? How do you expect Sold to know they are duplicate
> records? Maybe the issue is there somewhere.
>
> Regards,
>  Alex
> On 13 Dec 2012 15:17, "Dikchant Sahi"  wrote:
>
> > Hi Alex,
> >
> > You got my point right. What I see is merge adds duplicate document. Is
> > there a way to overwrite existing document in one core by another. Can
> > merge operation lead to data corruption, say in case when the core on
> > client had uncommitted changes.
> >
> > What would be a better solution for my requirement, merge or indexing
> > XML/JSON?
> >
> > Regards,
> > Dikchant
> >
> > On Thu, Dec 13, 2012 at 6:39 PM, Alexandre Rafalovitch
> > wrote:
> >
> > > Not sure I fully understood this and maybe you already cover that by
> > > 'merge', but if you know what you gave the client last time, you can
> just
> > > build a differential as a second core, then on client mount that second
> > > core and merge it into the first one (e.g. with DIH).
> > >
> > > Just a thought.
> > >
> > > Regards,
> > >Alex.
> > >
> > > Personal blog: http://blog.outerthoughts.com/
> > > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> > > - Time is the quality of nature that keeps events from happening all at
> > > once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)
> > >
> > >
> > >
> > > On Thu, Dec 13, 2012 at 5:28 PM, Dikchant Sahi  > > >wrote:
> > >
> > > > Hi Erick,
> > > >
> > > > Sorry for creating the confusion. By slave, I mean the indexes on
> > client
> > > > machine will be replica of the master and in not same as the slave in
> > > > master-slave model. Below is the detail:
> > > >
> > > > The system is being developed to support search facility on 1000s of
> > > > system, a majority of which will be offline.
> > > >
> > > > The idea is that we will have a search system which will be sold
> > > > on subscription basis. For each of the subscriber, we will copy the
> > > master
> > > > index to their local machine, over a drive or CD. Now, if a
> subscriber
> > > > comes after 2 months and want the updates, we just want to provide
> the
> > > > deltas for 2 month as the volume of data is huge. For this we can
> think
> > > of
> > > > two approaches:
> > > > 1. Fetch the documents which are less than 2 months old  in JSON
> format
> > > > from master Solr. Copy it to the subscriber machine
> > > > and index those documents. (copy through cd / memory sticks)
> > > > 2. Create separate indexes for each month on our master machine. Copy
> > the
> > > > indexes to the client machine and merge. Prior to merge we need to
> > delete
> > > > records which the new index has, to avoid duplicates.
> > > >
> > > > As long as the setup is new, we will copy the complete index and
> > restart
> > > > Solr. We are not sure of the best approach for copying the deltas.
> > > >
> > > > Thanks,
> > > > Dikchant
> > > >
> > > >
> > > >
> > > > On Thu, Dec 13, 2012 at 3:52 AM, Erick Erickson <
> > erickerick...@gmail.com
> > > > >wrote:
> > > >
> > > > > This is somewhat confusing. You say that box2 is the slave, yet
> > they're
> > > > not
> > > > > connected? Then you need to copy the /data index from
> box
> > 1
> > > to
> > > > > box 2 manually (I'd have box2 solr shut down at the time) and
> restart
> > > > Solr.
> > > > >
> > > > > Why can't the boxes be connected? That's a much simpler way of
> going
> > > > about
> > > > > it.
> > > > >
> > > > > Best
> > > > > Erick
> > > > >
> > > > >
> > > > > On Tue, Dec 11, 2012 at 1:04 AM, Dikchant Sahi <
> > contacts...@gmail.com
> > > > > >wrote:
> > > > >
> > > > > > Hi Walter,
> > > > > >
> > > >

Re: Update / replication of offline indexes

2012-12-13 Thread Dikchant Sahi
Hi Alex,

You got my point right. What I see is merge adds duplicate document. Is
there a way to overwrite existing document in one core by another. Can
merge operation lead to data corruption, say in case when the core on
client had uncommitted changes.

What would be a better solution for my requirement, merge or indexing
XML/JSON?

Regards,
Dikchant

On Thu, Dec 13, 2012 at 6:39 PM, Alexandre Rafalovitch
wrote:

> Not sure I fully understood this and maybe you already cover that by
> 'merge', but if you know what you gave the client last time, you can just
> build a differential as a second core, then on client mount that second
> core and merge it into the first one (e.g. with DIH).
>
> Just a thought.
>
> Regards,
>Alex.
>
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all at
> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
>
>
>
> On Thu, Dec 13, 2012 at 5:28 PM, Dikchant Sahi  >wrote:
>
> > Hi Erick,
> >
> > Sorry for creating the confusion. By slave, I mean the indexes on client
> > machine will be replica of the master and in not same as the slave in
> > master-slave model. Below is the detail:
> >
> > The system is being developed to support search facility on 1000s of
> > system, a majority of which will be offline.
> >
> > The idea is that we will have a search system which will be sold
> > on subscription basis. For each of the subscriber, we will copy the
> master
> > index to their local machine, over a drive or CD. Now, if a subscriber
> > comes after 2 months and want the updates, we just want to provide the
> > deltas for 2 month as the volume of data is huge. For this we can think
> of
> > two approaches:
> > 1. Fetch the documents which are less than 2 months old  in JSON format
> > from master Solr. Copy it to the subscriber machine
> > and index those documents. (copy through cd / memory sticks)
> > 2. Create separate indexes for each month on our master machine. Copy the
> > indexes to the client machine and merge. Prior to merge we need to delete
> > records which the new index has, to avoid duplicates.
> >
> > As long as the setup is new, we will copy the complete index and restart
> > Solr. We are not sure of the best approach for copying the deltas.
> >
> > Thanks,
> > Dikchant
> >
> >
> >
> > On Thu, Dec 13, 2012 at 3:52 AM, Erick Erickson  > >wrote:
> >
> > > This is somewhat confusing. You say that box2 is the slave, yet they're
> > not
> > > connected? Then you need to copy the /data index from box 1
> to
> > > box 2 manually (I'd have box2 solr shut down at the time) and restart
> > Solr.
> > >
> > > Why can't the boxes be connected? That's a much simpler way of going
> > about
> > > it.
> > >
> > > Best
> > > Erick
> > >
> > >
> > > On Tue, Dec 11, 2012 at 1:04 AM, Dikchant Sahi  > > >wrote:
> > >
> > > > Hi Walter,
> > > >
> > > > Thanks for the response.
> > > >
> > > > Commit will help to reflect changes on Box1. We are able to achieve
> > this.
> > > > We want the changes to reflect in Box2.
> > > >
> > > > We have two indexes. Say
> > > > Box1: Master & DB has been setup. Data Import runs on this.
> > > > Box2: Slave running.
> > > >
> > > > We want all the updates on Box1 to be merged/present in index on
> Box2.
> > > Both
> > > > the boxes are not connected over n/w. How can be achieve this.
> > > >
> > > > Please let me know, if am not clear.
> > > >
> > > > Thanks again!
> > > >
> > > > Regards,
> > > > Dikchant
> > > >
> > > > On Tue, Dec 11, 2012 at 11:22 AM, Walter Underwood <
> > > wun...@wunderwood.org
> > > > >wrote:
> > > >
> > > > > You do not need to manage online and offline indexes. Commit when
> you
> > > are
> > > > > done with your updates and Solr will take care of it for you. The
> > > changes
> > > > > are not live until you commit.
> > > > >
> > > > > wunder
> > > > >
> > > > > On Dec 10, 2012, at 9:46 PM, Dikchant Sahi wrote:
> > > > >
> > > > > > Hi,
> > > > > &

Re: Update / replication of offline indexes

2012-12-12 Thread Dikchant Sahi
Hi Erick,

Sorry for creating the confusion. By slave, I mean the indexes on client
machine will be replica of the master and in not same as the slave in
master-slave model. Below is the detail:

The system is being developed to support search facility on 1000s of
system, a majority of which will be offline.

The idea is that we will have a search system which will be sold
on subscription basis. For each of the subscriber, we will copy the master
index to their local machine, over a drive or CD. Now, if a subscriber
comes after 2 months and want the updates, we just want to provide the
deltas for 2 month as the volume of data is huge. For this we can think of
two approaches:
1. Fetch the documents which are less than 2 months old  in JSON format
from master Solr. Copy it to the subscriber machine
and index those documents. (copy through cd / memory sticks)
2. Create separate indexes for each month on our master machine. Copy the
indexes to the client machine and merge. Prior to merge we need to delete
records which the new index has, to avoid duplicates.

As long as the setup is new, we will copy the complete index and restart
Solr. We are not sure of the best approach for copying the deltas.

Thanks,
Dikchant



On Thu, Dec 13, 2012 at 3:52 AM, Erick Erickson wrote:

> This is somewhat confusing. You say that box2 is the slave, yet they're not
> connected? Then you need to copy the /data index from box 1 to
> box 2 manually (I'd have box2 solr shut down at the time) and restart Solr.
>
> Why can't the boxes be connected? That's a much simpler way of going about
> it.
>
> Best
> Erick
>
>
> On Tue, Dec 11, 2012 at 1:04 AM, Dikchant Sahi  >wrote:
>
> > Hi Walter,
> >
> > Thanks for the response.
> >
> > Commit will help to reflect changes on Box1. We are able to achieve this.
> > We want the changes to reflect in Box2.
> >
> > We have two indexes. Say
> > Box1: Master & DB has been setup. Data Import runs on this.
> > Box2: Slave running.
> >
> > We want all the updates on Box1 to be merged/present in index on Box2.
> Both
> > the boxes are not connected over n/w. How can be achieve this.
> >
> > Please let me know, if am not clear.
> >
> > Thanks again!
> >
> > Regards,
> > Dikchant
> >
> > On Tue, Dec 11, 2012 at 11:22 AM, Walter Underwood <
> wun...@wunderwood.org
> > >wrote:
> >
> > > You do not need to manage online and offline indexes. Commit when you
> are
> > > done with your updates and Solr will take care of it for you. The
> changes
> > > are not live until you commit.
> > >
> > > wunder
> > >
> > > On Dec 10, 2012, at 9:46 PM, Dikchant Sahi wrote:
> > >
> > > > Hi,
> > > >
> > > > How can we do delta update of offline indexes?
> > > >
> > > > We have the master index on which data import will be done. The index
> > > > directory will be copied to slave machine in case of full update,
> > through
> > > > CD as the  slave/client machine is offline.
> > > > So, what should be the approach for getting the delta to the slave. I
> > can
> > > > think of two approaches.
> > > >
> > > > 1.Create separate indexes of the delta on the master machine, copy it
> > to
> > > > the slave machine and merge. Before merging the indexes on the client
> > > > machine, delete all the updated and deleted documents in client
> machine
> > > > else merge will add duplicates. So along with the index, we need to
> > > > transfer the list of documents which has been updated/deleted.
> > > >
> > > > 2. Extract all the documents which has changed since a particular
> time
> > in
> > > > XML/JSON and index it in client machine.
> > > >
> > > > The size of indexes are huge, so we cannot rollover index everytime.
> > > >
> > > > Please help me with your take and challenges you see in the above
> > > > approaches. Please suggest if you think of any other better approach.
> > > >
> > > > Thanks a ton!
> > > >
> > > > Regards,
> > > > Dikchant
> > >
> > > --
> > > Walter Underwood
> > > wun...@wunderwood.org
> > >
> > >
> > >
> > >
> >
>


Re: Update multiple documents

2012-12-11 Thread Dikchant Sahi
My intention is to allow search on person names in the second index also.
If we use personId in the second index, is there a way to achieve that?

Yes, we are looking for join kind of feature.

Thanks!

On Wed, Dec 12, 2012 at 8:31 AM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

> But is that the best approach?  If you use personIds in your second index
> then you don't have to did that. Maybe you are after joins in Solr?
>
> Otis
> --
> SOLR Performance Monitoring - http://sematext.com/spm
> On Dec 11, 2012 1:21 PM, "Dikchant Sahi"  wrote:
>
> > Hi,
> >
> > We have two set of related indexes.
> >
> > Index1: person(personId, person_name, field2, field3)
> > Index2: mapping (id, fieldx, fieldy, person)
> >
> > When ever any person name changes, we need to update both the indexes.
> For
> > person field, we can update the person name as we have personId which is
> > the uniqueKey. How can we update the person names in index2.
> >
> > Eg:
> > Index1: person(001, Micheal Jackson, value1, value2)
> >
> > Index2: mapping(1234, Thriller, Micheal Jackson)
> > (1235, Billy Jean, Micheal Jackson)
> >
> > *Micheal* Jackson changes to *Michael* Jackson
> >
> > What would be the best approach solution to this problem.
> >
> > Thanks,
> > Dikchant
> >
>


Re: Update / replication of offline indexes

2012-12-10 Thread Dikchant Sahi
Hi Walter,

Thanks for the response.

Commit will help to reflect changes on Box1. We are able to achieve this.
We want the changes to reflect in Box2.

We have two indexes. Say
Box1: Master & DB has been setup. Data Import runs on this.
Box2: Slave running.

We want all the updates on Box1 to be merged/present in index on Box2. Both
the boxes are not connected over n/w. How can be achieve this.

Please let me know, if am not clear.

Thanks again!

Regards,
Dikchant

On Tue, Dec 11, 2012 at 11:22 AM, Walter Underwood wrote:

> You do not need to manage online and offline indexes. Commit when you are
> done with your updates and Solr will take care of it for you. The changes
> are not live until you commit.
>
> wunder
>
> On Dec 10, 2012, at 9:46 PM, Dikchant Sahi wrote:
>
> > Hi,
> >
> > How can we do delta update of offline indexes?
> >
> > We have the master index on which data import will be done. The index
> > directory will be copied to slave machine in case of full update, through
> > CD as the  slave/client machine is offline.
> > So, what should be the approach for getting the delta to the slave. I can
> > think of two approaches.
> >
> > 1.Create separate indexes of the delta on the master machine, copy it to
> > the slave machine and merge. Before merging the indexes on the client
> > machine, delete all the updated and deleted documents in client machine
> > else merge will add duplicates. So along with the index, we need to
> > transfer the list of documents which has been updated/deleted.
> >
> > 2. Extract all the documents which has changed since a particular time in
> > XML/JSON and index it in client machine.
> >
> > The size of indexes are huge, so we cannot rollover index everytime.
> >
> > Please help me with your take and challenges you see in the above
> > approaches. Please suggest if you think of any other better approach.
> >
> > Thanks a ton!
> >
> > Regards,
> > Dikchant
>
> --
> Walter Underwood
> wun...@wunderwood.org
>
>
>
>


Re: multiple indexes?

2012-11-30 Thread Dikchant Sahi
Multiple indexes can be setup using the multi core feature of Solr.

Below are the steps:
1. Add the core name and storage location of the core to
the $SOLR_HOME/solr.xml file.
  
**
**
  

2. Create the core-directories specified and following sub-directories in
it:
- conf: Contains the configs and schema definition
- lib: Contains the required libraries
- data: Will be created automatically on first run. This would contain
the actual index.

While indexing the docs, you specify the core name in the url as follows:
  http://:/solr//update?

Similarly you do while querying.

Please refer to Solr Wiki, it has the complete details.

Hope this helps!

- Dikchant

On Sat, Dec 1, 2012 at 10:41 AM, Joe Zhang  wrote:

> May I ask: how to set up multiple indexes, and specify which index to send
> the docs to at indexing time, and later on, how to specify which index to
> work with?
>
> A related question: what is the storage location and structure of solr
> indexes?
>
> Thanks in advance, guys!
>
> Joe.
>


Re: solr issue with seaching words

2012-09-04 Thread Dikchant Sahi
Try debugging it using analysis page or running the query in debug mode
(&debugQuery=true).

In analysis page, add 'RCA-Jack/' to index and 'jacke' to query. This might
help you understanding the behavior.

If still unable to debug, some additional information would be required to
help.

On Tue, Sep 4, 2012 at 3:38 PM, zainu  wrote:

> I am facing a strange problem. I am searching for word "jacke" but solr
> also
> returns result where my description contains 'RCA-Jack/'. Íf i search
> "jacka" or "jackc" or "jackd", it works fine and does not return me any
> result which is what i am expecting in this case.
>
> Only when there is "jacke", it return me result with "RCA-Jack/". So there
> seems some kind of relationshio between "e" and "/" and it considers e as
> "/".
>
> Any help?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/solr-issue-with-seaching-words-tp4005200.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Search results not returned for a str field

2012-07-20 Thread Dikchant Sahi
DefaultSearchField is the field which is queried if you don't explicitly
specify the fields to query on.

Please refer to the below link:
http://wiki.apache.org/solr/SchemaXml

On Sat, Jul 21, 2012 at 12:56 AM, Michael Della Bitta <
michael.della.bi...@appinions.com> wrote:

> Hello, Lakshmi,
>
> The issue is the fieldType you've assigned to the fields in your
> schema does not perform any analysis on the string before indexing it.
> So it will only do exact matches. If you want to do matches against
> portions of the field value, use one of the "text" types that come in
> the default schema.
>
> Michael Della Bitta
>
> 
> Appinions, Inc. -- Where Influence Isn’t a Game.
> http://www.appinions.com
>
>
> On Fri, Jul 20, 2012 at 3:18 PM, Lakshmi Bhargavi
>  wrote:
> > Hi ,
> >
> > I have the following configuration
> > 
> >
> >
> > 
> >   
> > > omitNorms="true"/>
> >   
> >
> >  
> >
> >> multiValued="false" required="true"/>
> >> multiValued="false" />
> >> multiValued="false" />
> >> multiValued="false" />
> >  
> >
> >
> >  id
> >
> >
> >  name
> >
> >
> >  
> > 
> >
> > I am also attaching the solr config file
> > http://lucene.472066.n3.nabble.com/file/n3996313/solrconfig.xml
> > solrconfig.xml
> >
> > I indexed a document
> >
> > 
> >   MA147LL/A
> >   Apple 60 GB iPod with Video Playback Black
> >
> > 
> >
> > When I do a wildcard search , the results are returned
> > http://localhost:8983/solr/select?q=*:*
> >
> >   
> > - 
> > - 
> >   0
> >   1
> >   
> > - 
> > - 
> >   MA147LL/A
> >   Apple 60 GB iPod with Video Playback Black
> >   
> >   
> >   
> >
> >
> > but the results are not returned for specific query
> > http://localhost:8983/solr/core0/select?q=iPod
> >
> > 
> > - 
> > - 
> >   0
> >   5
> >   
> >   
> >   
> >
> > Could some one please let me know what is wrong? Also would be very
> helpful
> > if some one can explain the significance of the defaultsearch field.
> >
> > Thanks,
> > lakshmi
> >
> >
> >
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/Search-results-not-returned-for-a-str-field-tp3996313.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: NGram for misspelt words

2012-07-18 Thread Dikchant Sahi
Have you tried the analysis window to debug.

I believe you are doing something wrong in the fieldType.

On Wed, Jul 18, 2012 at 8:07 PM, Husain, Yavar  wrote:

> Thanks Sahi. I have replaced my EdgeNGramFilterFactory to
> NGramFilterFactory as I need substrings not just in front or back but
> anywhere.
> You are right I put the same NGramFilterFactory in both Query and Index
> however now it does not return any results not even the basic one.
>
> -Original Message-----
> From: Dikchant Sahi [mailto:contacts...@gmail.com]
> Sent: Wednesday, July 18, 2012 7:54 PM
> To: solr-user@lucene.apache.org
> Subject: Re: NGram for misspelt words
>
> You are creating grams only while indexing and not querying hence 'ludlwo'
> would not match. Your analyzer will create the following grams while
> indexing for 'ludlow': lu lud ludl ludlo ludlow and hence would not match
> to 'ludlwo'.
>
> Either you need to create gram while querying also or use Edit Distance.
>
> On Wed, Jul 18, 2012 at 7:43 PM, Husain, Yavar 
> wrote:
>
> >
> >
> >
> > I have configured NGram Indexing for some fields.
> >
> > Say I search for the city Ludlow, I get the results (normal search)
> >
> > If I search for Ludlo (with w ommitted) I get the results
> >
> > If I search for Ludl (with ow ommitted) I still get the results
> >
> > I know that they are all partial strings of the main string hence
> > NGram works perfect.
> >
> > But when I type in Ludlwo (misspelt, characters o and w interchanged)
> > I dont get any results, It should ideally match "Ludl" and provide the
> > results.
> >
> > I am not looking for Edit distance based Spell Correctors. How can I
> > make above NGram based search work?
> >
> > Here is my schema.xml (NGramFieldType):
> >
> >  > stored="false" multiValued="true">
> >
> > 
> >
> > 
> >
> > 
> >
> > 
> >
> >  > maxGramSize="15" side="front" />
> >
> >
> >
> > 
> >
> > 
> >
> > 
> >
> > 
> >
> > 
> >
> > 
> >
> > 
> >
> >
> > 
> > 
> > **
> > This message may contain confidential or
> > proprietary information intended only for the use of
> > theaddressee(s) named above or may contain information that is
> > legally privileged. If you arenot the intended addressee, or the
> > person responsible for delivering it to the intended addressee,you
> > are hereby notified that reading, disseminating, distributing or
> > copying this message is strictlyprohibited. If you have received
> > this message by mistake, please immediately notify us byreplying
> > to the message and delete the original message and any copies
> > immediately thereafter.  Thank you.~
> >
> > **
> > 
> > FAFLD
> > 
> >
>


Re: NGram for misspelt words

2012-07-18 Thread Dikchant Sahi
You are creating grams only while indexing and not querying hence 'ludlwo'
would not match. Your analyzer will create the following grams while
indexing for 'ludlow': lu lud ludl ludlo ludlow and hence would not match
to 'ludlwo'.

Either you need to create gram while querying also or use Edit Distance.

On Wed, Jul 18, 2012 at 7:43 PM, Husain, Yavar  wrote:

>
>
>
> I have configured NGram Indexing for some fields.
>
> Say I search for the city Ludlow, I get the results (normal search)
>
> If I search for Ludlo (with w ommitted) I get the results
>
> If I search for Ludl (with ow ommitted) I still get the results
>
> I know that they are all partial strings of the main string hence NGram
> works perfect.
>
> But when I type in Ludlwo (misspelt, characters o and w interchanged) I
> dont get any results, It should ideally match "Ludl" and provide the
> results.
>
> I am not looking for Edit distance based Spell Correctors. How can I make
> above NGram based search work?
>
> Here is my schema.xml (NGramFieldType):
>
>  stored="false" multiValued="true">
>
> 
>
> 
>
> 
>
> 
>
>  maxGramSize="15" side="front" />
>
>
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
>
> 
> 
> **This
> message may contain confidential or proprietary information intended only
> for the use of theaddressee(s) named above or may contain information
> that is legally privileged. If you arenot the intended addressee, or
> the person responsible for delivering it to the intended addressee,you
> are hereby notified that reading, disseminating, distributing or copying
> this message is strictlyprohibited. If you have received this message
> by mistake, please immediately notify us byreplying to the message and
> delete the original message and any copies immediately thereafter.
> 
> Thank you.~
>
> **
> FAFLD
> 
>


Re: Big Data Analysis and Management - 2 day Workshop

2012-05-23 Thread Dikchant Sahi
Hi Manish,

The attachment seems to be missing. Would you mind sharing the same.

Am a Search Engineer based in Bangalore. Would me interested in attending
the workshop.

Best Regards,
Dikchant Sahi

On Thu, May 24, 2012 at 10:22 AM, Manish Bafna wrote:

> Dear Friend,
> We are organizing a workshop on Big Data. Here are details regarding the
> same.
> Please forward it to your company HR and also your friends and let me know
> if anyone is interested. We have early bird offer if registration is done
> before 31st May 2012.
>
>
> Big Data is one space that is buzzing in the market big time. There are
> several applications of various technologies involved around Big Data. Many
> a times when we work as part of various project or product development we
> all will be streamlining our time and energy towards its successful
> delivery. To ensure your colleagues don't miss out on this hot topic and to
> stay abreast with these niche things, we thought we will share our
> expertise with Senior Developers and Architects through this workshop on *Big
> Data Analysis and Management* that we have scheduled on *Bangalore on
> June 16th and 17th.*
>  **
> We will be covering various topics under the following 4 broad headlines.
> You can check the attached outline for a detailed insight into what we will
> cover under each head. It is definitely going to be an intensive and
> relevant hands-on session along with vivid explanation of concepts and
> theories around it. On a lighter note, there is definitely going to be lot
> of jargons flowing around all participants in this short span of two days.
> *
> *
> *Content Extraction (hands-on using Apache Tika)*
> *Distribute Content in NOSQL ways (hands-on using Cassandra, Neo4j)* *Search
> and Indexing (hands-on using Solr and Tika)* *Distributed computing and
> analysis using Hadoop MapReduce and Mahout (hands-on using Hadoop
> MapReduce, Mahout)*
> To register for this workshop, kindly send a mail to me along with the
> details of the participants (along with their profile will be better) and
> payment details.
>
> I am enclosing herewith the complete course details attached along with
> this mail. I along with two of my peers will be delivering this workshop.
> You can find our brief profile mentioned in the attached content.
>
> Feel free to contact me any time for any queries
>
>  With best regards,
> Manish.
>