RE: Update JSON format post 3.1?

2012-07-05 Thread Klostermeyer, Michael
Sorry...I found the answer in the comments of the previously mentioned Jira 
ticket.  Apparently the proposed solution differed from the final one (the 
"doc" structure key is not needed, apparently).

Mike


-Original Message-----
From: Klostermeyer, Michael [mailto:mklosterme...@riskexchange.com] 
Sent: Thursday, July 05, 2012 12:55 PM
To: solr-user@lucene.apache.org
Subject: Update JSON format post 3.1?

Is there any official documentation on the JSON format Solr 3.5 expects when 
adding/updating documents via /update/json?  Most of the official documentation 
is 3.1, and I am of the understanding this changed in v3.2 
(https://issues.apache.org/jira/browse/SOLR-2496).  I believe I have the 
correct format, but I am getting an odd error:

"The request sent by the client was syntactically incorrect (Expected: 
OBJECT_START but got ARRAY_START"

My JSON format is as follows (simplified):
{"add":
{"doc":
[
{"ID":"987654321","Name":"Steve Smith","ChildIDs":["3841"]} ] } }

The idea is that I want to be able to send multiple documents within the same 
request, although in this example I am demonstrating only a single document. 
"ChildIDs" is defined as a multivalued field.

Thanks.

Mike Klostermeyer


Update JSON format post 3.1?

2012-07-05 Thread Klostermeyer, Michael
Is there any official documentation on the JSON format Solr 3.5 expects when 
adding/updating documents via /update/json?  Most of the official documentation 
is 3.1, and I am of the understanding this changed in v3.2 
(https://issues.apache.org/jira/browse/SOLR-2496).  I believe I have the 
correct format, but I am getting an odd error:

"The request sent by the client was syntactically incorrect (Expected: 
OBJECT_START but got ARRAY_START"

My JSON format is as follows (simplified):
{"add":
{"doc":
[
{"ID":"987654321","Name":"Steve Smith","ChildIDs":["3841"]}
]
}
}

The idea is that I want to be able to send multiple documents within the same 
request, although in this example I am demonstrating only a single document. 
"ChildIDs" is defined as a multivalued field.

Thanks.

Mike Klostermeyer


RE: DIH - unable to ADD individual new documents

2012-07-03 Thread Klostermeyer, Michael
I haven't, but will consider those alternatives.  I think right now I'm going 
to go w/ a hybrid approach, meaning my scheduled and full updates will continue 
to use the DIH, as those seem to work really well.  My NTR indexing needs will 
be handled via the JSON processor.  For individual updates this will enable me 
to utilize an existing ORM infrastructure fairly easily (famous last words, I 
know).

Thanks for the help, as always.

Mike


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Tuesday, July 03, 2012 2:58 PM
To: solr-user@lucene.apache.org
Subject: Re: DIH - unable to ADD individual new documents

Mike:

Have you considered using one (or several) SolrJ clients to do your indexing? 
That can give you a finer control granularity than DIH. Or even do your NRT 
with SolrJ or

Here's an example program, you can take out the Tika stuff pretty easily..

Best
Erick

On Tue, Jul 3, 2012 at 3:35 PM, Klostermeyer, Michael 
 wrote:
> Well that little bit of knowledge changes things for me, doesn't it?  I 
> appreciate your response very much.  Without knowing that about the DIH, I 
> attempted to have my DIH handler handle all circumstances, namely the 
> "batch", scheduled job, and immediate/NRT indexing.  Looks like I'm going to 
> have to severely re-think that strategy.
>
> Thanks again...and if anyone has any further input how I can best/most 
> efficiently accomplish all 3 above, please let me know.
>
> Mike
>
>
> -Original Message-
> From: Dyer, James [mailto:james.d...@ingrambook.com]
> Sent: Tuesday, July 03, 2012 1:12 PM
> To: solr-user@lucene.apache.org
> Subject: RE: DIH - unable to ADD individual new documents
>
> A DIH request handler can only process one "run" at a time.  So if DIH is 
> still in process and you kick off a new DIH "full-import" it will silently 
> ignore the new command.  To have more than one DIH "run" going at a time it 
> is necessary to configure more than one handler instance in sorlconfig.xml.  
> But even then you'll have to be careful to find one that is free before 
> trying to use it.
>
> Regardless, to do what you want, you'll need to poll the DIH response screen 
> to be sure it isn't running before starting a new one.  It would be simplest 
> to leave it with just 1 DIH handler in solrconfig.xml.  If you've got to have 
> an undefined # of concurrent updates going at once you're best off to not use 
> DIH.
>
> Perhaps a better usage pattern for which DIH was designed for is to put the 
> doc id's in an update table with a timestamp.  Have your queries join to the 
> update table "where timestamp > ${dih.last_index_time}".  Set up crontab or 
> whatever to kick off DIH every so often.  If the prior run is still in 
> progress, it will just skip that run, but because we're dealing with 
> timestamps that get written automatically when DIH finishes, you will only 
> experience a delayed update, not a lost update.  By batching your updates 
> like this you will also have fewer commits, which will be beneficial for 
> performance all around.
>
> Of course if you're trying to do this with the near-real-time functionality 
> batching isn't your answer.  But DIH isn't designed at all to work well with 
> NRT either...
>
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
>
>
> -Original Message-
> From: Klostermeyer, Michael [mailto:mklosterme...@riskexchange.com]
> Sent: Tuesday, July 03, 2012 1:55 PM
> To: solr-user@lucene.apache.org
> Subject: RE: DIH - unable to ADD individual new documents
>
> Some interesting findings over the last hours, that may change the context of 
> this discussion...
>
> Due to the nature of the application, I need the ability to fire off 
> individual "ADDs" on several different entities at basically the same time.  
> So, I am making 2-4 Solr ADD calls within 100ms of each other.  While 
> troubleshooting this, I found that if I only made 1 Solr ADD call (ignoring 
> the other entities), it updated the index as expected.  However, when all 
> were fired off, proper indexing did not occur (at least on one of the 
> entities) and no errors were logged.  I am still attempting to figure out if 
> ALL of the 2-4 entities failed to ADD, or if some failed and others succeeded.
>
> So...does this have something to do with Solr's index/message queuing (v3.5)? 
>  How does Solr handle these types of rapid requests, and even more important, 
> how do I get the status of an individual DIH call vs simply the status of the 
> "latest" call at /dataimport?
>
> Mike
>
>
> -Original Message

RE: DIH - unable to ADD individual new documents

2012-07-03 Thread Klostermeyer, Michael
Well that little bit of knowledge changes things for me, doesn't it?  I 
appreciate your response very much.  Without knowing that about the DIH, I 
attempted to have my DIH handler handle all circumstances, namely the "batch", 
scheduled job, and immediate/NRT indexing.  Looks like I'm going to have to 
severely re-think that strategy.

Thanks again...and if anyone has any further input how I can best/most 
efficiently accomplish all 3 above, please let me know.

Mike


-Original Message-
From: Dyer, James [mailto:james.d...@ingrambook.com] 
Sent: Tuesday, July 03, 2012 1:12 PM
To: solr-user@lucene.apache.org
Subject: RE: DIH - unable to ADD individual new documents

A DIH request handler can only process one "run" at a time.  So if DIH is still 
in process and you kick off a new DIH "full-import" it will silently ignore the 
new command.  To have more than one DIH "run" going at a time it is necessary 
to configure more than one handler instance in sorlconfig.xml.  But even then 
you'll have to be careful to find one that is free before trying to use it.

Regardless, to do what you want, you'll need to poll the DIH response screen to 
be sure it isn't running before starting a new one.  It would be simplest to 
leave it with just 1 DIH handler in solrconfig.xml.  If you've got to have an 
undefined # of concurrent updates going at once you're best off to not use DIH.

Perhaps a better usage pattern for which DIH was designed for is to put the doc 
id's in an update table with a timestamp.  Have your queries join to the update 
table "where timestamp > ${dih.last_index_time}".  Set up crontab or whatever 
to kick off DIH every so often.  If the prior run is still in progress, it will 
just skip that run, but because we're dealing with timestamps that get written 
automatically when DIH finishes, you will only experience a delayed update, not 
a lost update.  By batching your updates like this you will also have fewer 
commits, which will be beneficial for performance all around.

Of course if you're trying to do this with the near-real-time functionality 
batching isn't your answer.  But DIH isn't designed at all to work well with 
NRT either...

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Klostermeyer, Michael [mailto:mklosterme...@riskexchange.com] 
Sent: Tuesday, July 03, 2012 1:55 PM
To: solr-user@lucene.apache.org
Subject: RE: DIH - unable to ADD individual new documents

Some interesting findings over the last hours, that may change the context of 
this discussion...

Due to the nature of the application, I need the ability to fire off individual 
"ADDs" on several different entities at basically the same time.  So, I am 
making 2-4 Solr ADD calls within 100ms of each other.  While troubleshooting 
this, I found that if I only made 1 Solr ADD call (ignoring the other 
entities), it updated the index as expected.  However, when all were fired off, 
proper indexing did not occur (at least on one of the entities) and no errors 
were logged.  I am still attempting to figure out if ALL of the 2-4 entities 
failed to ADD, or if some failed and others succeeded.

So...does this have something to do with Solr's index/message queuing (v3.5)?  
How does Solr handle these types of rapid requests, and even more important, 
how do I get the status of an individual DIH call vs simply the status of the 
"latest" call at /dataimport?

Mike


-Original Message-
From: Gora Mohanty [mailto:g...@mimirtech.com] 
Sent: Monday, July 02, 2012 10:02 PM
To: solr-user@lucene.apache.org
Subject: Re: DIH - unable to ADD individual new documents

On 3 July 2012 07:54, Klostermeyer, Michael  
wrote:
> I should add that I am using the full-import command in all cases, and 
> setting clean=false for the individual adds.

What does the data-import page report at the end of the full-import, i.e., how 
many documents were indexed?
Are there any error messages in the Solr logs? Please share with us your DIH 
configuration file, and Solr schema.xml.

Regards,
Gora


RE: DIH - unable to ADD individual new documents

2012-07-03 Thread Klostermeyer, Michael
Some interesting findings over the last hours, that may change the context of 
this discussion...

Due to the nature of the application, I need the ability to fire off individual 
"ADDs" on several different entities at basically the same time.  So, I am 
making 2-4 Solr ADD calls within 100ms of each other.  While troubleshooting 
this, I found that if I only made 1 Solr ADD call (ignoring the other 
entities), it updated the index as expected.  However, when all were fired off, 
proper indexing did not occur (at least on one of the entities) and no errors 
were logged.  I am still attempting to figure out if ALL of the 2-4 entities 
failed to ADD, or if some failed and others succeeded.

So...does this have something to do with Solr's index/message queuing (v3.5)?  
How does Solr handle these types of rapid requests, and even more important, 
how do I get the status of an individual DIH call vs simply the status of the 
"latest" call at /dataimport?

Mike


-Original Message-
From: Gora Mohanty [mailto:g...@mimirtech.com] 
Sent: Monday, July 02, 2012 10:02 PM
To: solr-user@lucene.apache.org
Subject: Re: DIH - unable to ADD individual new documents

On 3 July 2012 07:54, Klostermeyer, Michael  
wrote:
> I should add that I am using the full-import command in all cases, and 
> setting clean=false for the individual adds.

What does the data-import page report at the end of the full-import, i.e., how 
many documents were indexed?
Are there any error messages in the Solr logs? Please share with us your DIH 
configuration file, and Solr schema.xml.

Regards,
Gora


RE: DIH - unable to ADD individual new documents

2012-07-02 Thread Klostermeyer, Michael
The URL I am using is 
http://localhost/solr/dataimport?commit=true&wt=json&clean=false&uniqueID=2028046&command=full%2Dimport&entity=myEntityName

uniqueID is the ID of the newly created DB record.  This ID gets passed to the 
stored procedure and returns the expected data when I run the SP directly.

Mike


-Original Message-
From: Klostermeyer, Michael [mailto:mklosterme...@riskexchange.com] 
Sent: Monday, July 02, 2012 8:24 PM
To: solr-user@lucene.apache.org
Subject: RE: DIH - unable to ADD individual new documents

I should add that I am using the full-import command in all cases, and setting 
clean=false for the individual adds.

Mike


-Original Message-
From: Klostermeyer, Michael [mailto:mklosterme...@riskexchange.com] 
Sent: Monday, July 02, 2012 5:41 PM
To: solr-user@lucene.apache.org
Subject: DIH - unable to ADD individual new documents

I am not able to ADD individual documents via the DIH, but updating works as 
expected.   The stored procedure that is called within the DIH returns the 
expected data for the new document, Solr appears to "do its thing", but it 
never makes it to the Solr server, as evidence that subsequent queries do not 
return it.

Is there a trick to adding new documents using the DIH?

Mike



RE: DIH - unable to ADD individual new documents

2012-07-02 Thread Klostermeyer, Michael
I should add that I am using the full-import command in all cases, and setting 
clean=false for the individual adds.

Mike


-Original Message-
From: Klostermeyer, Michael [mailto:mklosterme...@riskexchange.com] 
Sent: Monday, July 02, 2012 5:41 PM
To: solr-user@lucene.apache.org
Subject: DIH - unable to ADD individual new documents

I am not able to ADD individual documents via the DIH, but updating works as 
expected.   The stored procedure that is called within the DIH returns the 
expected data for the new document, Solr appears to "do its thing", but it 
never makes it to the Solr server, as evidence that subsequent queries do not 
return it.

Is there a trick to adding new documents using the DIH?

Mike



DIH - unable to ADD individual new documents

2012-07-02 Thread Klostermeyer, Michael
I am not able to ADD individual documents via the DIH, but updating works as 
expected.   The stored procedure that is called within the DIH returns the 
expected data for the new document, Solr appears to "do its thing", but it 
never makes it to the Solr server, as evidence that subsequent queries do not 
return it.

Is there a trick to adding new documents using the DIH?

Mike



RE: NGram and full word

2012-06-29 Thread Klostermeyer, Michael
With the help of this list, I solved a similar issue by altering my query as 
follows:

Before (did not return full word matches): q=searchTerm*
After (returned full-word matches and wildcard searches as you would expect): 
q=searchTerm OR searchTerm*

You can also boost the exact match by doing the following: q=searchTerm^2 OR 
searchTerm*

Not sure if the NGram changes things or not, but it might be a starting point.

Mike


-Original Message-
From: Arkadi Colson [mailto:ark...@smartbit.be] 
Sent: Friday, June 29, 2012 3:17 AM
To: solr-user@lucene.apache.org
Subject: NGram and full word

Hi

I have a question regarding the NGram filter and full word search.

When I insert "arkadicolson" into Solr and search for "arkadic", solr will find 
a match.
When searching for "arkadicols", Solr will not find a match because the 
maxGramSize is set to 8.
However when searching for the full word "arkadicolson" Solr will also not 
match.

Is there a way to also match full word in combination with NGram?

Thanks!

 
   
 
 
 
 
 
 
 
   
   
 
 -->
 
 
 
 
   
 

--
Smartbit bvba
Hoogstraat 13
B-3670 Meeuwen
T: +32 11 64 08 80
F: +32 89 46 81 10
W: http://www.smartbit.be
E: ark...@smartbit.be



RE: Wildcard queries on whole words

2012-06-27 Thread Klostermeyer, Michael
Interesting solution.  Can you then explain to me for a given query:

?q='kloster' OR kloster*

How the "exact match" part of that is boosted (assuming the above is how you 
formulated your query)?

Thanks!

Mike

-Original Message-
From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com] 
Sent: Wednesday, June 27, 2012 11:11 AM
To: solr-user@lucene.apache.org
Subject: Re: Wildcard queries on whole words

Hi Michael,

I solved a similar issue by reformatting my query to do an OR across an exact 
match or a wildcard query, with the exact match boosted.

HTH,

Michael Della Bitta


Appinions, Inc. -- Where Influence Isn't a Game.
http://www.appinions.com


On Wed, Jun 27, 2012 at 12:14 PM, Klostermeyer, Michael 
 wrote:
> I am researching an issue w/ wildcard searches on complete words in 3.5.  For 
> example, searching for "kloster*" returns "klostermeyer", but "klostermeyer*" 
> returns nothing.
>
> The field being queried has the following analysis chain (standard 
> 'text_general'):
>
>  positionIncrementGap="100">
>      
>        
>         words="stopwords.txt" enablePositionIncrements="true" />
>        
>      
>      
>        
>         words="stopwords.txt" enablePositionIncrements="true" />
>         synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>        
>      
> 
>
> I see that wildcard queries are not analyzed at query time, which could be 
> the source of my issue, but I read conflicting advice on the interwebs.  I 
> read also that this might have changed in 3.6, but I am unable to determine 
> if my specific issue is addressed.
>
> My questions:
>
> 1.       Why am I getting these search results with my current config?
>
> 2.       How do I fix it in 3.5?  Would upgrading to 3.6 also "fix" my issue?
>
> Thanks!
>
> Mike Klostermeyer
>


Wildcard queries on whole words

2012-06-27 Thread Klostermeyer, Michael
I am researching an issue w/ wildcard searches on complete words in 3.5.  For 
example, searching for "kloster*" returns "klostermeyer", but "klostermeyer*" 
returns nothing.

The field being queried has the following analysis chain (standard 
'text_general'):


  



  
  




  


I see that wildcard queries are not analyzed at query time, which could be the 
source of my issue, but I read conflicting advice on the interwebs.  I read 
also that this might have changed in 3.6, but I am unable to determine if my 
specific issue is addressed.

My questions:

1.   Why am I getting these search results with my current config?

2.   How do I fix it in 3.5?  Would upgrading to 3.6 also "fix" my issue?

Thanks!

Mike Klostermeyer



RE: Many Cores with Solr

2012-05-29 Thread Klostermeyer, Michael
IMO it would be a better (from Solr's perspective) to handle the security w/ 
the application code.  Each query could include a "?fq=userID:12345..." which 
would limit results to only what that user is allowed to see.
 
Mike

-Original Message-
From: Mike Douglass [mailto:mikeadougl...@gmail.com] 
Sent: Wednesday, May 23, 2012 4:02 PM
To: solr-user@lucene.apache.org
Subject: Re: Many Cores with Solr

My interest in this is the desire to create one index per user of a system - 
the issue here is privacy - data indexed for one user should not be visible to 
other users.

For this purpose solr will be hidden behind a proxy which steers authenticated 
sessions to the appropriat ecore.

Does this seem like a valid/feasible approach?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Many-Cores-with-Solr-tp3161889p3985789.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: SOLR Security

2012-05-10 Thread Klostermeyer, Michael
Instead of hitting the Solr server directly from the client, I think I would go 
through your application server, which would have access to all the users data 
and can forward that to the Solr server, thereby hiding it from the client.

Mike


-Original Message-
From: Anupam Bhattacharya [mailto:anupam...@gmail.com] 
Sent: Thursday, May 10, 2012 9:53 PM
To: solr-user@lucene.apache.org
Subject: SOLR Security

I am using Ajax-Solr Framework for creating a search interface. The search 
interface works well.
In my case, the results have document level security so by even indexing 
records with there authorized users help me to filter results per user based on 
the authentication of the user.

The problem that I have to a pass always a parameter to the SOLR Server with 
userid={xyz} which one can figure out from the SOLR URL(ajax call url) using 
Firebug tool in the Net Console on Firefox and can change this parameter value 
to see others records which he/she is not authorized.
Basically it is Cross Site Scripting Issue.

I have read about some approaches for Solr Security like Nginx with Jetty & 
.htaccess based security.Overall what i understand from this is that we can 
restrict users to do update/delete operations on SOLR as well as we can 
restrict the SOLR admin interface to certain IPs also. But How can I restrict 
the {solr-server}/solr/select based results from access by different user id's ?


Populating 'multivalue' fields (m:1 relationships)

2012-05-10 Thread Klostermeyer, Michael
I am attempting to index a DB schema that has a many:one relationship.  I 
assume I would index this within Solr as a 'multivalue=true' field, is that 
correct?

I am currently populating the Solr index w/ a stored procedure in which each DB 
record is "flattened" into a single document in Solr.  I would like one of 
those Solr document fields to contain multiple values from the m:1 table (i.e. 
[fieldName]=1,3,6,8,7).  I then need to be able to do a "fq=fieldname:3" and 
return the previous record.

My question is: how do I populate Solr with a multi-valued field for many:1 
relationships?  My first guess would be to concatenate all the values from the 
'many' side into a single DB column in the SP, then pipe that column into a 
multivalue=true Solr field.  The DB side of that will be ugly, but would the 
Solr side index this properly?  If so, what would be the delimiter that would 
allow Solr to index each element of the multivalued field?

[Warning: possible tangent below...but I think this question is relevant.  If 
not, tell me and I'll break it out]

I have gone out of my way to "flatten" the data within my SP prior to giving it 
to Solr.  For my solution stated above, I would have the following data (Title 
being the "many" side of the m:1, and PK being the Solr unique ID):

PK | Name | Title
Pk_1 | Dwight | Sales, Assistant To The Regional Manager
Pk_2 | Jim | Sales
Pk_3 | Michael | Regional Manger

Below is an example of a non-flattened record set.  How would Solr handle a 
data set in which the following data was indexed:

PK | Name | Title
Pk_1 | Dwight | Sales
Pk_1 | Dwight | Assistant To The Regional Manager
Pk_2 | Jim | Sales
Pk_3 | Michael | Regional Manger

My assumption is that the second Pk_1 record would overwrite the first, thereby 
losing the "Sales" title from Pk_1.  Am I correct on that assumption?

I'm new to this ballgame, so don't be shy about pointing me down a different 
path if I am doing anything incorrectly.

Thanks!

Mike Klostermeyer


RE: Auto suggest on indexed file content filtered based on user

2012-04-24 Thread Klostermeyer, Michael
I'm new to Solr, but I would think the fq=[username] would work here.

http://wiki.apache.org/solr/CommonQueryParameters#fq

Mike

-Original Message-
From: prakash_ajp [mailto:prakash_...@yahoo.com] 
Sent: Tuesday, April 24, 2012 11:07 AM
To: solr-user@lucene.apache.org
Subject: Re: Auto suggest on indexed file content filtered based on user

Right now, the query is a very simple one, something like q=text. Basically, it 
would return ['textview', 'textviewer', ..]

But the issue is, the 'textviewer' could be from a file that is out of bounds 
for this user. So, ultimately I would like to include the userName in the 
query. As mentioned earlier, userName is another field in the main index.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Auto-suggest-on-indexed-file-content-filtered-based-on-user-tp3934565p3935765.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: using stored procedures in solr query..

2012-04-03 Thread Klostermeyer, Michael
Yes, I just did this in my DIH with SQL Server 2008 and Solr 3.5; it looked 
somewhat like the following:



Mike Klostermeyer

-Original Message-
From: vighnesh [mailto:svighnesh...@gmail.com] 
Sent: Tuesday, April 03, 2012 5:29 AM
To: solr-user@lucene.apache.org
Subject: using storedprocedures in solr query..
Importance: Low

Hi all,

Is it possible to execute to stored procedures placed in data-config.xml file 
in solr ?

please give response ...

Thanx in advance.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/using-storedprocedures-in-solr-query-tp3880557p3880557.html
Sent from the Solr - User mailing list archive at Nabble.com.