RE: Update JSON format post 3.1?
Sorry...I found the answer in the comments of the previously mentioned Jira ticket. Apparently the proposed solution differed from the final one (the "doc" structure key is not needed, apparently). Mike -Original Message----- From: Klostermeyer, Michael [mailto:mklosterme...@riskexchange.com] Sent: Thursday, July 05, 2012 12:55 PM To: solr-user@lucene.apache.org Subject: Update JSON format post 3.1? Is there any official documentation on the JSON format Solr 3.5 expects when adding/updating documents via /update/json? Most of the official documentation is 3.1, and I am of the understanding this changed in v3.2 (https://issues.apache.org/jira/browse/SOLR-2496). I believe I have the correct format, but I am getting an odd error: "The request sent by the client was syntactically incorrect (Expected: OBJECT_START but got ARRAY_START" My JSON format is as follows (simplified): {"add": {"doc": [ {"ID":"987654321","Name":"Steve Smith","ChildIDs":["3841"]} ] } } The idea is that I want to be able to send multiple documents within the same request, although in this example I am demonstrating only a single document. "ChildIDs" is defined as a multivalued field. Thanks. Mike Klostermeyer
Update JSON format post 3.1?
Is there any official documentation on the JSON format Solr 3.5 expects when adding/updating documents via /update/json? Most of the official documentation is 3.1, and I am of the understanding this changed in v3.2 (https://issues.apache.org/jira/browse/SOLR-2496). I believe I have the correct format, but I am getting an odd error: "The request sent by the client was syntactically incorrect (Expected: OBJECT_START but got ARRAY_START" My JSON format is as follows (simplified): {"add": {"doc": [ {"ID":"987654321","Name":"Steve Smith","ChildIDs":["3841"]} ] } } The idea is that I want to be able to send multiple documents within the same request, although in this example I am demonstrating only a single document. "ChildIDs" is defined as a multivalued field. Thanks. Mike Klostermeyer
RE: DIH - unable to ADD individual new documents
I haven't, but will consider those alternatives. I think right now I'm going to go w/ a hybrid approach, meaning my scheduled and full updates will continue to use the DIH, as those seem to work really well. My NTR indexing needs will be handled via the JSON processor. For individual updates this will enable me to utilize an existing ORM infrastructure fairly easily (famous last words, I know). Thanks for the help, as always. Mike -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, July 03, 2012 2:58 PM To: solr-user@lucene.apache.org Subject: Re: DIH - unable to ADD individual new documents Mike: Have you considered using one (or several) SolrJ clients to do your indexing? That can give you a finer control granularity than DIH. Or even do your NRT with SolrJ or Here's an example program, you can take out the Tika stuff pretty easily.. Best Erick On Tue, Jul 3, 2012 at 3:35 PM, Klostermeyer, Michael wrote: > Well that little bit of knowledge changes things for me, doesn't it? I > appreciate your response very much. Without knowing that about the DIH, I > attempted to have my DIH handler handle all circumstances, namely the > "batch", scheduled job, and immediate/NRT indexing. Looks like I'm going to > have to severely re-think that strategy. > > Thanks again...and if anyone has any further input how I can best/most > efficiently accomplish all 3 above, please let me know. > > Mike > > > -Original Message- > From: Dyer, James [mailto:james.d...@ingrambook.com] > Sent: Tuesday, July 03, 2012 1:12 PM > To: solr-user@lucene.apache.org > Subject: RE: DIH - unable to ADD individual new documents > > A DIH request handler can only process one "run" at a time. So if DIH is > still in process and you kick off a new DIH "full-import" it will silently > ignore the new command. To have more than one DIH "run" going at a time it > is necessary to configure more than one handler instance in sorlconfig.xml. > But even then you'll have to be careful to find one that is free before > trying to use it. > > Regardless, to do what you want, you'll need to poll the DIH response screen > to be sure it isn't running before starting a new one. It would be simplest > to leave it with just 1 DIH handler in solrconfig.xml. If you've got to have > an undefined # of concurrent updates going at once you're best off to not use > DIH. > > Perhaps a better usage pattern for which DIH was designed for is to put the > doc id's in an update table with a timestamp. Have your queries join to the > update table "where timestamp > ${dih.last_index_time}". Set up crontab or > whatever to kick off DIH every so often. If the prior run is still in > progress, it will just skip that run, but because we're dealing with > timestamps that get written automatically when DIH finishes, you will only > experience a delayed update, not a lost update. By batching your updates > like this you will also have fewer commits, which will be beneficial for > performance all around. > > Of course if you're trying to do this with the near-real-time functionality > batching isn't your answer. But DIH isn't designed at all to work well with > NRT either... > > James Dyer > E-Commerce Systems > Ingram Content Group > (615) 213-4311 > > > -Original Message- > From: Klostermeyer, Michael [mailto:mklosterme...@riskexchange.com] > Sent: Tuesday, July 03, 2012 1:55 PM > To: solr-user@lucene.apache.org > Subject: RE: DIH - unable to ADD individual new documents > > Some interesting findings over the last hours, that may change the context of > this discussion... > > Due to the nature of the application, I need the ability to fire off > individual "ADDs" on several different entities at basically the same time. > So, I am making 2-4 Solr ADD calls within 100ms of each other. While > troubleshooting this, I found that if I only made 1 Solr ADD call (ignoring > the other entities), it updated the index as expected. However, when all > were fired off, proper indexing did not occur (at least on one of the > entities) and no errors were logged. I am still attempting to figure out if > ALL of the 2-4 entities failed to ADD, or if some failed and others succeeded. > > So...does this have something to do with Solr's index/message queuing (v3.5)? > How does Solr handle these types of rapid requests, and even more important, > how do I get the status of an individual DIH call vs simply the status of the > "latest" call at /dataimport? > > Mike > > > -Original Message
RE: DIH - unable to ADD individual new documents
Well that little bit of knowledge changes things for me, doesn't it? I appreciate your response very much. Without knowing that about the DIH, I attempted to have my DIH handler handle all circumstances, namely the "batch", scheduled job, and immediate/NRT indexing. Looks like I'm going to have to severely re-think that strategy. Thanks again...and if anyone has any further input how I can best/most efficiently accomplish all 3 above, please let me know. Mike -Original Message- From: Dyer, James [mailto:james.d...@ingrambook.com] Sent: Tuesday, July 03, 2012 1:12 PM To: solr-user@lucene.apache.org Subject: RE: DIH - unable to ADD individual new documents A DIH request handler can only process one "run" at a time. So if DIH is still in process and you kick off a new DIH "full-import" it will silently ignore the new command. To have more than one DIH "run" going at a time it is necessary to configure more than one handler instance in sorlconfig.xml. But even then you'll have to be careful to find one that is free before trying to use it. Regardless, to do what you want, you'll need to poll the DIH response screen to be sure it isn't running before starting a new one. It would be simplest to leave it with just 1 DIH handler in solrconfig.xml. If you've got to have an undefined # of concurrent updates going at once you're best off to not use DIH. Perhaps a better usage pattern for which DIH was designed for is to put the doc id's in an update table with a timestamp. Have your queries join to the update table "where timestamp > ${dih.last_index_time}". Set up crontab or whatever to kick off DIH every so often. If the prior run is still in progress, it will just skip that run, but because we're dealing with timestamps that get written automatically when DIH finishes, you will only experience a delayed update, not a lost update. By batching your updates like this you will also have fewer commits, which will be beneficial for performance all around. Of course if you're trying to do this with the near-real-time functionality batching isn't your answer. But DIH isn't designed at all to work well with NRT either... James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Klostermeyer, Michael [mailto:mklosterme...@riskexchange.com] Sent: Tuesday, July 03, 2012 1:55 PM To: solr-user@lucene.apache.org Subject: RE: DIH - unable to ADD individual new documents Some interesting findings over the last hours, that may change the context of this discussion... Due to the nature of the application, I need the ability to fire off individual "ADDs" on several different entities at basically the same time. So, I am making 2-4 Solr ADD calls within 100ms of each other. While troubleshooting this, I found that if I only made 1 Solr ADD call (ignoring the other entities), it updated the index as expected. However, when all were fired off, proper indexing did not occur (at least on one of the entities) and no errors were logged. I am still attempting to figure out if ALL of the 2-4 entities failed to ADD, or if some failed and others succeeded. So...does this have something to do with Solr's index/message queuing (v3.5)? How does Solr handle these types of rapid requests, and even more important, how do I get the status of an individual DIH call vs simply the status of the "latest" call at /dataimport? Mike -Original Message- From: Gora Mohanty [mailto:g...@mimirtech.com] Sent: Monday, July 02, 2012 10:02 PM To: solr-user@lucene.apache.org Subject: Re: DIH - unable to ADD individual new documents On 3 July 2012 07:54, Klostermeyer, Michael wrote: > I should add that I am using the full-import command in all cases, and > setting clean=false for the individual adds. What does the data-import page report at the end of the full-import, i.e., how many documents were indexed? Are there any error messages in the Solr logs? Please share with us your DIH configuration file, and Solr schema.xml. Regards, Gora
RE: DIH - unable to ADD individual new documents
Some interesting findings over the last hours, that may change the context of this discussion... Due to the nature of the application, I need the ability to fire off individual "ADDs" on several different entities at basically the same time. So, I am making 2-4 Solr ADD calls within 100ms of each other. While troubleshooting this, I found that if I only made 1 Solr ADD call (ignoring the other entities), it updated the index as expected. However, when all were fired off, proper indexing did not occur (at least on one of the entities) and no errors were logged. I am still attempting to figure out if ALL of the 2-4 entities failed to ADD, or if some failed and others succeeded. So...does this have something to do with Solr's index/message queuing (v3.5)? How does Solr handle these types of rapid requests, and even more important, how do I get the status of an individual DIH call vs simply the status of the "latest" call at /dataimport? Mike -Original Message- From: Gora Mohanty [mailto:g...@mimirtech.com] Sent: Monday, July 02, 2012 10:02 PM To: solr-user@lucene.apache.org Subject: Re: DIH - unable to ADD individual new documents On 3 July 2012 07:54, Klostermeyer, Michael wrote: > I should add that I am using the full-import command in all cases, and > setting clean=false for the individual adds. What does the data-import page report at the end of the full-import, i.e., how many documents were indexed? Are there any error messages in the Solr logs? Please share with us your DIH configuration file, and Solr schema.xml. Regards, Gora
RE: DIH - unable to ADD individual new documents
The URL I am using is http://localhost/solr/dataimport?commit=true&wt=json&clean=false&uniqueID=2028046&command=full%2Dimport&entity=myEntityName uniqueID is the ID of the newly created DB record. This ID gets passed to the stored procedure and returns the expected data when I run the SP directly. Mike -Original Message- From: Klostermeyer, Michael [mailto:mklosterme...@riskexchange.com] Sent: Monday, July 02, 2012 8:24 PM To: solr-user@lucene.apache.org Subject: RE: DIH - unable to ADD individual new documents I should add that I am using the full-import command in all cases, and setting clean=false for the individual adds. Mike -Original Message- From: Klostermeyer, Michael [mailto:mklosterme...@riskexchange.com] Sent: Monday, July 02, 2012 5:41 PM To: solr-user@lucene.apache.org Subject: DIH - unable to ADD individual new documents I am not able to ADD individual documents via the DIH, but updating works as expected. The stored procedure that is called within the DIH returns the expected data for the new document, Solr appears to "do its thing", but it never makes it to the Solr server, as evidence that subsequent queries do not return it. Is there a trick to adding new documents using the DIH? Mike
RE: DIH - unable to ADD individual new documents
I should add that I am using the full-import command in all cases, and setting clean=false for the individual adds. Mike -Original Message- From: Klostermeyer, Michael [mailto:mklosterme...@riskexchange.com] Sent: Monday, July 02, 2012 5:41 PM To: solr-user@lucene.apache.org Subject: DIH - unable to ADD individual new documents I am not able to ADD individual documents via the DIH, but updating works as expected. The stored procedure that is called within the DIH returns the expected data for the new document, Solr appears to "do its thing", but it never makes it to the Solr server, as evidence that subsequent queries do not return it. Is there a trick to adding new documents using the DIH? Mike
DIH - unable to ADD individual new documents
I am not able to ADD individual documents via the DIH, but updating works as expected. The stored procedure that is called within the DIH returns the expected data for the new document, Solr appears to "do its thing", but it never makes it to the Solr server, as evidence that subsequent queries do not return it. Is there a trick to adding new documents using the DIH? Mike
RE: NGram and full word
With the help of this list, I solved a similar issue by altering my query as follows: Before (did not return full word matches): q=searchTerm* After (returned full-word matches and wildcard searches as you would expect): q=searchTerm OR searchTerm* You can also boost the exact match by doing the following: q=searchTerm^2 OR searchTerm* Not sure if the NGram changes things or not, but it might be a starting point. Mike -Original Message- From: Arkadi Colson [mailto:ark...@smartbit.be] Sent: Friday, June 29, 2012 3:17 AM To: solr-user@lucene.apache.org Subject: NGram and full word Hi I have a question regarding the NGram filter and full word search. When I insert "arkadicolson" into Solr and search for "arkadic", solr will find a match. When searching for "arkadicols", Solr will not find a match because the maxGramSize is set to 8. However when searching for the full word "arkadicolson" Solr will also not match. Is there a way to also match full word in combination with NGram? Thanks! --> -- Smartbit bvba Hoogstraat 13 B-3670 Meeuwen T: +32 11 64 08 80 F: +32 89 46 81 10 W: http://www.smartbit.be E: ark...@smartbit.be
RE: Wildcard queries on whole words
Interesting solution. Can you then explain to me for a given query: ?q='kloster' OR kloster* How the "exact match" part of that is boosted (assuming the above is how you formulated your query)? Thanks! Mike -Original Message- From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com] Sent: Wednesday, June 27, 2012 11:11 AM To: solr-user@lucene.apache.org Subject: Re: Wildcard queries on whole words Hi Michael, I solved a similar issue by reformatting my query to do an OR across an exact match or a wildcard query, with the exact match boosted. HTH, Michael Della Bitta Appinions, Inc. -- Where Influence Isn't a Game. http://www.appinions.com On Wed, Jun 27, 2012 at 12:14 PM, Klostermeyer, Michael wrote: > I am researching an issue w/ wildcard searches on complete words in 3.5. For > example, searching for "kloster*" returns "klostermeyer", but "klostermeyer*" > returns nothing. > > The field being queried has the following analysis chain (standard > 'text_general'): > > positionIncrementGap="100"> > > > words="stopwords.txt" enablePositionIncrements="true" /> > > > > > words="stopwords.txt" enablePositionIncrements="true" /> > synonyms="synonyms.txt" ignoreCase="true" expand="true"/> > > > > > I see that wildcard queries are not analyzed at query time, which could be > the source of my issue, but I read conflicting advice on the interwebs. I > read also that this might have changed in 3.6, but I am unable to determine > if my specific issue is addressed. > > My questions: > > 1. Why am I getting these search results with my current config? > > 2. How do I fix it in 3.5? Would upgrading to 3.6 also "fix" my issue? > > Thanks! > > Mike Klostermeyer >
Wildcard queries on whole words
I am researching an issue w/ wildcard searches on complete words in 3.5. For example, searching for "kloster*" returns "klostermeyer", but "klostermeyer*" returns nothing. The field being queried has the following analysis chain (standard 'text_general'): I see that wildcard queries are not analyzed at query time, which could be the source of my issue, but I read conflicting advice on the interwebs. I read also that this might have changed in 3.6, but I am unable to determine if my specific issue is addressed. My questions: 1. Why am I getting these search results with my current config? 2. How do I fix it in 3.5? Would upgrading to 3.6 also "fix" my issue? Thanks! Mike Klostermeyer
RE: Many Cores with Solr
IMO it would be a better (from Solr's perspective) to handle the security w/ the application code. Each query could include a "?fq=userID:12345..." which would limit results to only what that user is allowed to see. Mike -Original Message- From: Mike Douglass [mailto:mikeadougl...@gmail.com] Sent: Wednesday, May 23, 2012 4:02 PM To: solr-user@lucene.apache.org Subject: Re: Many Cores with Solr My interest in this is the desire to create one index per user of a system - the issue here is privacy - data indexed for one user should not be visible to other users. For this purpose solr will be hidden behind a proxy which steers authenticated sessions to the appropriat ecore. Does this seem like a valid/feasible approach? -- View this message in context: http://lucene.472066.n3.nabble.com/Many-Cores-with-Solr-tp3161889p3985789.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: SOLR Security
Instead of hitting the Solr server directly from the client, I think I would go through your application server, which would have access to all the users data and can forward that to the Solr server, thereby hiding it from the client. Mike -Original Message- From: Anupam Bhattacharya [mailto:anupam...@gmail.com] Sent: Thursday, May 10, 2012 9:53 PM To: solr-user@lucene.apache.org Subject: SOLR Security I am using Ajax-Solr Framework for creating a search interface. The search interface works well. In my case, the results have document level security so by even indexing records with there authorized users help me to filter results per user based on the authentication of the user. The problem that I have to a pass always a parameter to the SOLR Server with userid={xyz} which one can figure out from the SOLR URL(ajax call url) using Firebug tool in the Net Console on Firefox and can change this parameter value to see others records which he/she is not authorized. Basically it is Cross Site Scripting Issue. I have read about some approaches for Solr Security like Nginx with Jetty & .htaccess based security.Overall what i understand from this is that we can restrict users to do update/delete operations on SOLR as well as we can restrict the SOLR admin interface to certain IPs also. But How can I restrict the {solr-server}/solr/select based results from access by different user id's ?
Populating 'multivalue' fields (m:1 relationships)
I am attempting to index a DB schema that has a many:one relationship. I assume I would index this within Solr as a 'multivalue=true' field, is that correct? I am currently populating the Solr index w/ a stored procedure in which each DB record is "flattened" into a single document in Solr. I would like one of those Solr document fields to contain multiple values from the m:1 table (i.e. [fieldName]=1,3,6,8,7). I then need to be able to do a "fq=fieldname:3" and return the previous record. My question is: how do I populate Solr with a multi-valued field for many:1 relationships? My first guess would be to concatenate all the values from the 'many' side into a single DB column in the SP, then pipe that column into a multivalue=true Solr field. The DB side of that will be ugly, but would the Solr side index this properly? If so, what would be the delimiter that would allow Solr to index each element of the multivalued field? [Warning: possible tangent below...but I think this question is relevant. If not, tell me and I'll break it out] I have gone out of my way to "flatten" the data within my SP prior to giving it to Solr. For my solution stated above, I would have the following data (Title being the "many" side of the m:1, and PK being the Solr unique ID): PK | Name | Title Pk_1 | Dwight | Sales, Assistant To The Regional Manager Pk_2 | Jim | Sales Pk_3 | Michael | Regional Manger Below is an example of a non-flattened record set. How would Solr handle a data set in which the following data was indexed: PK | Name | Title Pk_1 | Dwight | Sales Pk_1 | Dwight | Assistant To The Regional Manager Pk_2 | Jim | Sales Pk_3 | Michael | Regional Manger My assumption is that the second Pk_1 record would overwrite the first, thereby losing the "Sales" title from Pk_1. Am I correct on that assumption? I'm new to this ballgame, so don't be shy about pointing me down a different path if I am doing anything incorrectly. Thanks! Mike Klostermeyer
RE: Auto suggest on indexed file content filtered based on user
I'm new to Solr, but I would think the fq=[username] would work here. http://wiki.apache.org/solr/CommonQueryParameters#fq Mike -Original Message- From: prakash_ajp [mailto:prakash_...@yahoo.com] Sent: Tuesday, April 24, 2012 11:07 AM To: solr-user@lucene.apache.org Subject: Re: Auto suggest on indexed file content filtered based on user Right now, the query is a very simple one, something like q=text. Basically, it would return ['textview', 'textviewer', ..] But the issue is, the 'textviewer' could be from a file that is out of bounds for this user. So, ultimately I would like to include the userName in the query. As mentioned earlier, userName is another field in the main index. -- View this message in context: http://lucene.472066.n3.nabble.com/Auto-suggest-on-indexed-file-content-filtered-based-on-user-tp3934565p3935765.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: using stored procedures in solr query..
Yes, I just did this in my DIH with SQL Server 2008 and Solr 3.5; it looked somewhat like the following: Mike Klostermeyer -Original Message- From: vighnesh [mailto:svighnesh...@gmail.com] Sent: Tuesday, April 03, 2012 5:29 AM To: solr-user@lucene.apache.org Subject: using storedprocedures in solr query.. Importance: Low Hi all, Is it possible to execute to stored procedures placed in data-config.xml file in solr ? please give response ... Thanx in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/using-storedprocedures-in-solr-query-tp3880557p3880557.html Sent from the Solr - User mailing list archive at Nabble.com.