Re: Events on updating documents

2021-01-21 Thread Walter Underwood
Solr is not a database. I strongly recommend that you NOT use it as a data 
store. You will lose data.

Solr does not have transactions. Don’t think of a Solr “commit” as a database 
commit. It is a command to start indexing the queued updates. It does not even 
attempt to meet ACID properties.

Redesign your system to use a database as a data store.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Jan 20, 2021, at 11:49 PM, haris.k...@vnc.biz wrote:
> 
> Hello,
> 
> We at VNC are using Solr for search and as a data store. We have a use-case 
> in which we want to hit a REST endpoint whenever documents are inserted, 
> updated, or deleted in Solr with the documents under consideration as well. 
> When exploring the Solr documentation, we found Event Listeners 
> 
>  with postCommit and postOptimize events. We have configured Solr to do 
> soft-commits every second and hard-commits every ten minutes to keep 
> real-time indexing intact. With that in mind the questions are:
> 
> Do we get the documents updated in the postCommit event? (Not able to find 
> any examples)
> Are there other events that are triggered when a doc is updated, deleted, or 
> inserted like those we have in an RDBMS?
> Is there a postSoftCommit event as well? (not mentioned in official docs)
> 
> Mit freundlichen Grüssen / Kind regards
> 
> Muhammad Haris Khan
> 
> VNC - Virtual Network Consult
> 
> -- Solr Ingenieur --



Events on updating documents

2021-01-21 Thread haris . khan
Hello, 

We at VNC are using Solr for search and as a data store. We have a use-case in 
which we want to hit a REST endpoint whenever documents are inserted, updated, 
or deleted in Solr with the documents under consideration as well. When 
exploring the Solr documentation, we found Event Listeners with postCommit and 
postOptimize events. We have configured Solr to do soft-commits every second 
and hard-commits every ten minutes to keep real-time indexing intact. With that 
in mind the questions are: 

Do we get the documents updated in the postCommit event? (Not able to find any 
examples)
Are there other events that are triggered when a doc is updated, deleted, or 
inserted like those we have in an RDBMS?
Is there a postSoftCommit event as well? (not mentioned in official docs)

Mit freundlichen Grüssen / Kind regards

Muhammad Haris Khan

VNC - Virtual Network Consult

-- Solr Ingenieur --


Re: updating documents via csv

2019-12-17 Thread Paras Lehana
Oh lol. How could I miss that! This is actually true for any bash command.
Glad that it worked.

On Wed, 18 Dec, 2019, 00:29 rhys J,  wrote:

> On Mon, Dec 16, 2019 at 11:58 PM Paras Lehana 
> wrote:
>
> > Hi Rhys,
> >
> > I use CDATA for XMLs:
> >
> >
> >  
> >
> > There should be a similar solution for JSON though I couldn't find the
> > specific one on the internet. If you are okay to use XMLs for indexing,
> you
> > can use this.
> >
> >
> We are set on using json, but I figured out how to handle the single quote.
>
> If i use curl " and then single quotes inside, I can escape the single
> quote in the field with no problem.
>
> Thanks for the help!
>
> Rhys
>

-- 
*
*

 


Re: updating documents via csv

2019-12-17 Thread rhys J
On Mon, Dec 16, 2019 at 11:58 PM Paras Lehana 
wrote:

> Hi Rhys,
>
> I use CDATA for XMLs:
>
>
>  
>
> There should be a similar solution for JSON though I couldn't find the
> specific one on the internet. If you are okay to use XMLs for indexing, you
> can use this.
>
>
We are set on using json, but I figured out how to handle the single quote.

If i use curl " and then single quotes inside, I can escape the single
quote in the field with no problem.

Thanks for the help!

Rhys


Re: updating documents via csv

2019-12-16 Thread Paras Lehana
Hi Rhys,

I use CDATA for XMLs:

   
 

There should be a similar solution for JSON though I couldn't find the
specific one on the internet. If you are okay to use XMLs for indexing, you
can use this.

On Tue, 17 Dec 2019 at 01:40, rhys J  wrote:

> Is there a way to update documents already stored in the solr cores via
> csv?
>
> The reason I am asking is because I am running into a problem with updating
> via script with single quotes embedded into the field itself.
>
> Example:
>
> curl http://localhost:8983/solr/dbtr/update?commit=true -d '[{ "id":
> "356767", "name1": {"set": "NORTH AMERICAN INT'L"},"name2": {"set": " "}}]'
>
> I have tried the following as well:
>
> curl http://localhost:8983/solr/dbtr/update?commit=true -d '[{ "id":
> "356767", "name1": {"set": "NORTH AMERICAN INT\'L"},"name2": {"set": "
> "}}]'
>
> curl http://localhost:8983/solr/dbtr/update?commit=true -d '[{ "id":
> "356767", "name1": {"set": "NORTH AMERICAN INT\\'L"},"name2": {"set": "
> "}}]'
>
> curl http://localhost:8983/solr/dbtr/update?commit=true -d '[{ \\"id\\":
> \\"356767\\", \\"name1\\": {\\"set\\": \\"NORTH AMERICAN INT\\'L\\"},}]'
>
> All of these break on the single quote embedded in field name1.
>
> Does anyone have any ideas as to what I can do to get around this?
>
> I will also eventually need to get around having an & inside a field too,
> but that hasn't come up yet.
>
> Thanks,
>
> Rhys
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
*
*

 


updating documents via csv

2019-12-16 Thread rhys J
Is there a way to update documents already stored in the solr cores via csv?

The reason I am asking is because I am running into a problem with updating
via script with single quotes embedded into the field itself.

Example:

curl http://localhost:8983/solr/dbtr/update?commit=true -d '[{ "id":
"356767", "name1": {"set": "NORTH AMERICAN INT'L"},"name2": {"set": " "}}]'

I have tried the following as well:

curl http://localhost:8983/solr/dbtr/update?commit=true -d '[{ "id":
"356767", "name1": {"set": "NORTH AMERICAN INT\'L"},"name2": {"set": " "}}]'

curl http://localhost:8983/solr/dbtr/update?commit=true -d '[{ "id":
"356767", "name1": {"set": "NORTH AMERICAN INT\\'L"},"name2": {"set": "
"}}]'

curl http://localhost:8983/solr/dbtr/update?commit=true -d '[{ \\"id\\":
\\"356767\\", \\"name1\\": {\\"set\\": \\"NORTH AMERICAN INT\\'L\\"},}]'

All of these break on the single quote embedded in field name1.

Does anyone have any ideas as to what I can do to get around this?

I will also eventually need to get around having an & inside a field too,
but that hasn't come up yet.

Thanks,

Rhys


Re: Updating documents and commit/rollback

2018-03-05 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Shawn,

On 3/2/18 7:46 PM, Shawn Heisey wrote:
> On 3/2/2018 10:39 AM, Christopher Schultz wrote:
>> The problem is that I'm updating the index after my SQL UPDATE(s)
>> have run, but before my SQL COMMIT occurs. I have had a problem
>> where the SQL fails and rolls-back, but the solrClient is not
>> rolled-back.
>> 
>> I'm a little wary of rolling-back Solr because, as I understand
>> it, the client itself doesn't carry any transactional
>> information. That is, it should be a shared-resource (within the
>> web application) and indeed, other clients could be connecting
>> from other places (like other app servers running the same
>> application). Performing either commit() or rollback() on the
>> Solr client will commit/rollback *all* writes since the last
>> commit, right?
> 
> Correct.  Relational databases typically keep track of transactions
> on one connection separately from transactions on another
> connection, and can roll one of them back without affecting the
> others.
> 
> Solr doesn't have this capability.  The reason that it doesn't have
> this capability is that Lucene doesn't have it, and the majority of
> Solr functionality is provided by Lucene.
> 
> If updates are happening concurrently from multiple sources, then 
> there's no way to have any kind of meaningful rollback.
> 
> I see two solutions:
> 
> 1) Funnel all updates through a single thread/process, which will
> not move on from one update to another until the final decision is
> made about that update.  Then rolling back becomes possible,
> because there is only one source for updates.  The disadvantage
> here is that this thread/process becomes a bottleneck, and
> performance may suffer greatly.  Also, it can be a single point of
> failure.  If the rate of updates is low, then the bottleneck may
> not be a problem.
> 
> 2) Have your updating software revert the changes "manually" in 
> situations where the SQL change is rolled back ... by either
> deleting the record or sending another update to change values back
> to what they were before.

Yeah, technique #2 was the only thing I could come up with that made
any sense. Serializing updates is probably more trouble than it's worth.

In an environment where I'd probably expect to have maybe 50 - 100
"writes" daily to a Solr core, how do you recommend commits be done?
The documents are quite small (user metadata like username, first/last
and email). Can I add/commit simultaneously? There seems to be no
reason to perform separate add/commit steps in this scenario.

- -chris
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQJRBAEBCAA7FiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlqddMUdHGNocmlzQGNo
cmlzdG9waGVyc2NodWx0ei5uZXQACgkQHPApP6U8pFjQHBAAiZaJLBQM6t6OLYea
LsGtqCtDTCmUuJGpBq7q8/+26OkgCTK0KDOGWlqpMeMvCe8uLlN0qDTGHEm0nLCk
Ils9Yv+UOP8iiYMvodUxv5d5Y75Yt5aQ0yZ8X7vp1KOCXTZhXIjmAdtw8KaC3z4y
zYJcI3DAEYurkmJcFVwZNQ7LRck2RWRNNsRfWaZ0yGAd2AUvvCp2zV3e0i5cs7hA
xICklU+5+5Nsy90pyDalnpgwrbc0uE6ZFGSkAocSDBdvNNONbNAq+sUYsov8af0+
6qhQWOqZOT2M+Ue51Nlqy+PtECzWOsqXcpFNyM/2Rsz1cnKCzAUbDs2Hi7m5R1UX
tST10VBvFTJ4GukGVPxHysVxwTHVg1HYCEngfHKS7HqiVtwkqWMzm315toWoDRfQ
J8EMeFZ/cQx716D+DPAKudGBWZ3akyODsb9h1KB4i85pGT4rijKhY7bxddhFDnHi
gbCdnpU9/pv8G/Y2SUhW4SgEUd3X6YZZD/4cZ4ocrf8KaXBFrLe8iz1aoFYI5ldh
i3TAi28dFHqxrofBTo4f42AXm9SYsycCQ2kBj7Yegyt5Sljfr3yoOckoJnNR05mX
2qjBIJJjJT0CvnV18azerdhpkZtcVbdVYC4WZHEjf6doC3SqqLHL6Pfu5Ha4APZ8
hc0tRk3wV+Cn/XVVx691QN0X1Nw=
=0s2n
-END PGP SIGNATURE-


Re: Updating documents and commit/rollback

2018-03-02 Thread Shawn Heisey
On 3/2/2018 10:39 AM, Christopher Schultz wrote:
> The problem is that I'm updating the index after my SQL UPDATE(s) have
> run, but before my SQL COMMIT occurs. I have had a problem where the SQL
> fails and rolls-back, but the solrClient is not rolled-back.
>
> I'm a little wary of rolling-back Solr because, as I understand it, the
> client itself doesn't carry any transactional information. That is, it
> should be a shared-resource (within the web application) and indeed,
> other clients could be connecting from other places (like other app
> servers running the same application). Performing either commit() or
> rollback() on the Solr client will commit/rollback *all* writes since
> the last commit, right?

Correct.  Relational databases typically keep track of transactions on
one connection separately from transactions on another connection, and
can roll one of them back without affecting the others.

Solr doesn't have this capability.  The reason that it doesn't have this
capability is that Lucene doesn't have it, and the majority of Solr
functionality is provided by Lucene.

If updates are happening concurrently from multiple sources, then
there's no way to have any kind of meaningful rollback.

I see two solutions:

1) Funnel all updates through a single thread/process, which will not
move on from one update to another until the final decision is made
about that update.  Then rolling back becomes possible, because there is
only one source for updates.  The disadvantage here is that this
thread/process becomes a bottleneck, and performance may suffer
greatly.  Also, it can be a single point of failure.  If the rate of
updates is low, then the bottleneck may not be a problem.

2) Have your updating software revert the changes "manually" in
situations where the SQL change is rolled back ... by either deleting
the record or sending another update to change values back to what they
were before.

Thanks,
Shawn



Updating documents and commit/rollback

2018-03-02 Thread Christopher Schultz
Hey, folks. I've been a long-time Lucene user (running a hilariously-old
1.9.1 version forever), but I'm only just now getting into using Solr.

My particular use-case is storing information about web-application
users so they can be found more quickly than our current RDBMS-based
search (SELECT ... FROM user WHERE username LIKE '%foo%' OR
email_address LIKE '%foo%' OR last_name LIKE '%foo%'...).

I've set up my Solr (very basic... just untar, bin/solr start), created
a core/collection (I'm running single-server for now, no cloudy
zookeeper stuff ATM), customized my schema (using the Schema API, since
hand-editing is discouraged) and loaded my data. I can search just fine
through the Solr dashboard.

I've also user solr-solrj to perform searches from within my
application, replacing the previous JDBC-based search with the
Solr-based one. All is well.

Now I'm trying to figure out the best way to update users in the index
when their information (e.g. first/last names) change. I have used
solr-solrj quite simply like this:

SolrInputDocument doc = new SolrInputDocument();
doc.addField("id", user.getId());
doc.addField("username", user.getUsername());
doc.addField("first_name", user.getFirstName());
doc.addField("last_name", user.getLastName());
...
solrClient.add("users", doc);
solrClient.commit();

I'm having a problem, though, and I'd like to know what the "right"
solution is.

The problem is that I'm updating the index after my SQL UPDATE(s) have
run, but before my SQL COMMIT occurs. I have had a problem where the SQL
fails and rolls-back, but the solrClient is not rolled-back.

I'm a little wary of rolling-back Solr because, as I understand it, the
client itself doesn't carry any transactional information. That is, it
should be a shared-resource (within the web application) and indeed,
other clients could be connecting from other places (like other app
servers running the same application). Performing either commit() or
rollback() on the Solr client will commit/rollback *all* writes since
the last commit, right?

That means that there is no meaningful way that I can say to Solr "oops,
I actually need you to NOT add that document I just told you about".
Instead, I have to either commit the document I don't want (and, I
dunno, delete it later or whatever) or risk rolling-back other writes
that other clients have performed.

Do I have that right?

So... what's the best way to do this kind of thing? Can I ask Solr to
add-and-commit at the same time? If so, how? Is there a meaningful
"rollback this one addition" that I can perform? If so, how?

Thanks for a great product,
-chris



signature.asc
Description: OpenPGP digital signature


Re: Updating documents with docvalues (not stored), commit question

2016-11-17 Thread Erick Erickson
I'm pretty sure that atomic updates use Real Time Get which means they'll
pull the values from in-memory structures for docs that haven't been
committed yet.

And as Shawn says, docValues isn't relevant here.

Best,
Erick

On Thu, Nov 17, 2016 at 5:52 AM, Shawn Heisey  wrote:
> On 11/17/2016 6:26 AM, Dorian Hoxha wrote:
>> Looks like you can update documents even using just doc-values
>> (without stored). While I understand the columnar-format, my issue
>> with this is that docValues are added when a 'commit' is done
>> (right?). Does that mean that it will force a commit (which is a slow
>> operation) when updating with docValues or does it do something more
>> smart ?
>
> The presence  or absence of docValues does not change commits at all.  A
> commit is a separate operation from indexing, although you can send
> commit=true with an indexing request and it would be started as soon as
> all the indexing for that request is done.
>
> https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>
> The URL above says "SolrCloud" but what it says also applies to
> non-cloud installs.
>
> Thanks,
> Shawn
>


Re: Updating documents with docvalues (not stored), commit question

2016-11-17 Thread Shawn Heisey
On 11/17/2016 6:26 AM, Dorian Hoxha wrote:
> Looks like you can update documents even using just doc-values
> (without stored). While I understand the columnar-format, my issue
> with this is that docValues are added when a 'commit' is done
> (right?). Does that mean that it will force a commit (which is a slow
> operation) when updating with docValues or does it do something more
> smart ? 

The presence  or absence of docValues does not change commits at all.  A
commit is a separate operation from indexing, although you can send
commit=true with an indexing request and it would be started as soon as
all the indexing for that request is done.

https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

The URL above says "SolrCloud" but what it says also applies to
non-cloud installs.

Thanks,
Shawn



Updating documents with docvalues (not stored), commit question

2016-11-17 Thread Dorian Hoxha
Looks like you can update documents even using just doc-values (without
stored). While I understand the columnar-format, my issue with this is that
docValues are added when a 'commit' is done (right?). Does that mean that
it will force a commit (which is a slow operation) when updating with
docValues or does it do something more smart ?

Thank You


Re: updating documents unintentionally adds extra values to certain fields

2013-04-22 Thread Chris Hostetter

: I am using solr 4.2, and have set up spatial search config as below
: 
: http://wiki.apache.org/solr/SpatialSearch#Schema_Configuration
: 
: But everything I make an update to a document,
: http://wiki.apache.org/solr/UpdateJSON#Updating_a_Solr_Index_with_JSON
: 
: more values of the *_coordinates fields gets inserted, even though it was
: not set to multivalue & this behavior doesn't happen to any of the other
: fields.

can you elaborate on what exactly you mena by "more values of the 
*_coordinates fields gets inserted" ?

FYI...

atomic updates work by leveraging the existing stored values of fields;
independently, the LatLonType field works by creating on the fly sub 
fields representing internal state.

My hunch is that you don't actaully have the LatLonType setup exactly as 
describedi n hte wiki you linked to, where "*_coordinate" is confiured 
with 'stored="false"' ... my hunch is that you have the *_coordinate 
dynamicField configured to stored="true", and so when you do an atomic 
update the old (stored) sub-field values are copied over and the (new) 
sub-field values are generated again by LatLonType.


-Hoss


updating documents unintentionally adds extra values to certain fields

2013-04-18 Thread joyce chan
Hi

I am using solr 4.2, and have set up spatial search config as below

http://wiki.apache.org/solr/SpatialSearch#Schema_Configuration

But everything I make an update to a document,
http://wiki.apache.org/solr/UpdateJSON#Updating_a_Solr_Index_with_JSON

more values of the *_coordinates fields gets inserted, even though it was
not set to multivalue & this behavior doesn't happen to any of the other
fields.

Any ideas how to avoid adding extra values to the _coordinates fields on
updates?


Re: Updating documents

2012-07-13 Thread Yonik Seeley
On Fri, Jul 13, 2012 at 3:50 PM, Jonatan Fournier
 wrote:
> On Thu, Jul 12, 2012 at 3:20 PM, Jonatan Fournier
>  wrote:
> But later on when I want to "append" cat3 to the field by doing this:
>
> "mv_f":{"add":"cat3"},
> ...
>
> I end up with something like this in the index:
>
> "mv_f":["{add=cat3}"],
>
> Obviously something is wrong with my syntax ;)

Are you using a custom update processor chain?  The
DistributedUpdateProcessor currently contains the logic for optimistic
concurrency and updates.
If you're not already, try some test commands with the stock server.

If you are already using the stock server, then perhaps you're not
sending what you think you are to Solr?

-Yonik
http://lucidimagination.com


Re: Updating documents

2012-07-13 Thread Jonatan Fournier
On Thu, Jul 12, 2012 at 3:20 PM, Jonatan Fournier
 wrote:
> Yonik,
>
> On Thu, Jul 12, 2012 at 12:52 PM, Yonik Seeley
>  wrote:
>> On Thu, Jul 12, 2012 at 12:38 PM, Jonatan Fournier
>>  wrote:
>>> On Thu, Jul 12, 2012 at 11:05 AM, Erick Erickson
 The partial documents update that Jonatan references also requires
 that all the fields be stored.
>>>
>>> If my only fields with stored="false" are copyField (e.g. I don't need
>>> their content to rebuild the document), are they gonna be re-copied
>>> with the partial document update?
>>
>> Correct - your setup should be fine.  Only original source fields (non
>> copyField targets) should have stored=true
>
> Another question I had related to partial update...
>
> $ ./post.sh foo.json
> {"responseHeader":{"status":409,"QTime":0},"error":{"msg":"Document
> not found for update.  id=foo","code":409}}
>
> Is there a flag for: if document does not exist, create it for me? The
> thing is that I don't know in advance if the document already exist
> (of course I could query first.. but I have millions of entry to
> process, might exist, might be an update I don't know...)
>
> My naive approach was to have in the same request two documents, one
> with only "set" using the unique ID, and then in the second one all
> the "add" (concerning multivalue field).
>
> So it would do the following:
>
> 1. Document (with id) exist or not don't care, use the following "set"
> command to update/create
> 2. 2nd pass, I know you exist (with above id), please add all those to
> the multivalue fields (none of those fields are in the initial
> updates)
>
> My rationale is that if the document exists, reset some fields, and
> then append the multivalue fields (those multivalue fields express
> historical updates)

Probably silly mistake on my side, but I don't seem to get the
"append/add" JSON syntax right for multiValue fields...

On my document initial creation it works great with

...
"mv_f":"cat1",
"mv_f":"cat2",
...

But later on when I want to "append" cat3 to the field by doing this:

"mv_f":{"add":"cat3"},
...

I end up with something like this in the index:

"mv_f":["{add=cat3}"],

Obviously something is wrong with my syntax ;)

--
jonatan

>
> The reason I created 2 documents is that Solr doesn't seem happy if I
> mix set and add in the same document :)
>
> --
> jonatan
>
>>
>> -Yonik
>> http://lucidimagination.com


Re: Updating documents

2012-07-13 Thread Yonik Seeley
>> I've just committed this change.
>
> Super thanks! I assume it will end up in the 4.0 release?

Yep!

-Yonik
http://lucidimagination.com


Re: Updating documents

2012-07-13 Thread Jonatan Fournier
On Fri, Jul 13, 2012 at 1:43 PM, Yonik Seeley
 wrote:
> On Fri, Jul 13, 2012 at 1:41 PM, Jonatan Fournier
>  wrote:
>> On Fri, Jul 13, 2012 at 12:57 AM, Yonik Seeley
>>  wrote:
>>> On Thu, Jul 12, 2012 at 3:20 PM, Jonatan Fournier
>>>  wrote:
 Is there a flag for: if document does not exist, create it for me?
>>>
>>> Not currently, but it certainly makes sense.
>>> The implementation should be easy. The most difficult part is figuring
>>> out the best syntax to specify this.
>>>
>>> Another idea: we could possibly switch to create-if-not-exist by
>>> default, and use the existing optimistic concurrency mechanism to
>>> specify that the document should exist.
>>>
>>> So specify _version_=1 if the document should exist and _version_=0
>>> (the default) if you don't care.
>>
>> Yes that would be neat!
>
> I've just committed this change.

Super thanks! I assume it will end up in the 4.0 release?

>
>> One more question related to partial document update. So far I'm able
>> to append to multivalue fields, set new value to regular/multivalue
>> fields. One thing I didn't find is the "remove" command, what is its
>> JSON syntax?
>
> Set it to the JSON value of null.
>
> -Yonik
> http://lucidimagination.com


Re: Updating documents

2012-07-13 Thread Yonik Seeley
On Fri, Jul 13, 2012 at 1:41 PM, Jonatan Fournier
 wrote:
> On Fri, Jul 13, 2012 at 12:57 AM, Yonik Seeley
>  wrote:
>> On Thu, Jul 12, 2012 at 3:20 PM, Jonatan Fournier
>>  wrote:
>>> Is there a flag for: if document does not exist, create it for me?
>>
>> Not currently, but it certainly makes sense.
>> The implementation should be easy. The most difficult part is figuring
>> out the best syntax to specify this.
>>
>> Another idea: we could possibly switch to create-if-not-exist by
>> default, and use the existing optimistic concurrency mechanism to
>> specify that the document should exist.
>>
>> So specify _version_=1 if the document should exist and _version_=0
>> (the default) if you don't care.
>
> Yes that would be neat!

I've just committed this change.

> One more question related to partial document update. So far I'm able
> to append to multivalue fields, set new value to regular/multivalue
> fields. One thing I didn't find is the "remove" command, what is its
> JSON syntax?

Set it to the JSON value of null.

-Yonik
http://lucidimagination.com


Re: Updating documents

2012-07-13 Thread Jonatan Fournier
On Fri, Jul 13, 2012 at 12:57 AM, Yonik Seeley
 wrote:
> On Thu, Jul 12, 2012 at 3:20 PM, Jonatan Fournier
>  wrote:
>> Is there a flag for: if document does not exist, create it for me?
>
> Not currently, but it certainly makes sense.
> The implementation should be easy. The most difficult part is figuring
> out the best syntax to specify this.
>
> Another idea: we could possibly switch to create-if-not-exist by
> default, and use the existing optimistic concurrency mechanism to
> specify that the document should exist.
>
> So specify _version_=1 if the document should exist and _version_=0
> (the default) if you don't care.

Yes that would be neat!

One more question related to partial document update. So far I'm able
to append to multivalue fields, set new value to regular/multivalue
fields. One thing I didn't find is the "remove" command, what is its
JSON syntax?

Thanks,

--
jonatan

>
> -Yonik
> http://lucidimagination.com


Re: Updating documents

2012-07-12 Thread Yonik Seeley
On Thu, Jul 12, 2012 at 3:20 PM, Jonatan Fournier
 wrote:
> Is there a flag for: if document does not exist, create it for me?

Not currently, but it certainly makes sense.
The implementation should be easy. The most difficult part is figuring
out the best syntax to specify this.

Another idea: we could possibly switch to create-if-not-exist by
default, and use the existing optimistic concurrency mechanism to
specify that the document should exist.

So specify _version_=1 if the document should exist and _version_=0
(the default) if you don't care.

-Yonik
http://lucidimagination.com


Re: Updating documents

2012-07-12 Thread Jonatan Fournier
Yonik,

On Thu, Jul 12, 2012 at 12:52 PM, Yonik Seeley
 wrote:
> On Thu, Jul 12, 2012 at 12:38 PM, Jonatan Fournier
>  wrote:
>> On Thu, Jul 12, 2012 at 11:05 AM, Erick Erickson
>>> The partial documents update that Jonatan references also requires
>>> that all the fields be stored.
>>
>> If my only fields with stored="false" are copyField (e.g. I don't need
>> their content to rebuild the document), are they gonna be re-copied
>> with the partial document update?
>
> Correct - your setup should be fine.  Only original source fields (non
> copyField targets) should have stored=true

Another question I had related to partial update...

$ ./post.sh foo.json
{"responseHeader":{"status":409,"QTime":0},"error":{"msg":"Document
not found for update.  id=foo","code":409}}

Is there a flag for: if document does not exist, create it for me? The
thing is that I don't know in advance if the document already exist
(of course I could query first.. but I have millions of entry to
process, might exist, might be an update I don't know...)

My naive approach was to have in the same request two documents, one
with only "set" using the unique ID, and then in the second one all
the "add" (concerning multivalue field).

So it would do the following:

1. Document (with id) exist or not don't care, use the following "set"
command to update/create
2. 2nd pass, I know you exist (with above id), please add all those to
the multivalue fields (none of those fields are in the initial
updates)

My rationale is that if the document exists, reset some fields, and
then append the multivalue fields (those multivalue fields express
historical updates)

The reason I created 2 documents is that Solr doesn't seem happy if I
mix set and add in the same document :)

--
jonatan

>
> -Yonik
> http://lucidimagination.com


Re: Updating documents

2012-07-12 Thread Yonik Seeley
On Thu, Jul 12, 2012 at 12:38 PM, Jonatan Fournier
 wrote:
> On Thu, Jul 12, 2012 at 11:05 AM, Erick Erickson
>> The partial documents update that Jonatan references also requires
>> that all the fields be stored.
>
> If my only fields with stored="false" are copyField (e.g. I don't need
> their content to rebuild the document), are they gonna be re-copied
> with the partial document update?

Correct - your setup should be fine.  Only original source fields (non
copyField targets) should have stored=true

-Yonik
http://lucidimagination.com


Re: Updating documents

2012-07-12 Thread Jonatan Fournier
Erick,

On Thu, Jul 12, 2012 at 11:05 AM, Erick Erickson
 wrote:
> Vinicius:
>
> No, fetching the document from the index, changing selected values and
> re-indexing probably
> won't work at all. The problem is that you only get _stored_ values
> back from Solr. So unless
> you've specified 'stored="true" ' for all your fields, you can't use
> the doc fetched from Solr to
> update a field.
>
> The partial documents update that Jonatan references also requires
> that all the fields be stored.

If my only fields with stored="false" are copyField (e.g. I don't need
their content to rebuild the document), are they gonna be re-copied
with the partial document update?

--
jonatan

>
> You're best bet is to go back to your system-of-record for the data
> and re-index the whole
> document.
>
> Best
> Erick
>
> On Wed, Jul 11, 2012 at 11:30 AM, Jonatan Fournier
>  wrote:
>> On Wed, Jul 11, 2012 at 10:57 AM, Vinicius Carvalho
>>  wrote:
>>> Hi there.
>>>
>>> I was checking the faq and found that solr does not support field updates
>>> right. So I assume that in order to update a document, one should first
>>> retrieve it by its Id and then change the required field and update the doc
>>> again. But then I wonder about fields that are indexed and not stored,
>>> since the new document that is sent to the index does not have the values,
>>> would this mean we will loose them?
>>>
>>> BTW any chances we see field level updates on 4.0 like elastic search has?
>>
>> I'm actually also looking a this new feature in 4.0-ALPHA:
>>
>> http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/
>>
>> I was wondering where the new xml tags where documented to support
>> these "set", "add to multi-value" etc.
>>
>> --
>> jonatan
>>
>>>
>>> Regards
>>>
>>> --
>>> The intuitive mind is a sacred gift and the
>>> rational mind is a faithful servant. We have
>>> created a society that honors the servant and
>>> has forgotten the gift.


Re: Updating documents

2012-07-12 Thread Erick Erickson
Vinicius:

No, fetching the document from the index, changing selected values and
re-indexing probably
won't work at all. The problem is that you only get _stored_ values
back from Solr. So unless
you've specified 'stored="true" ' for all your fields, you can't use
the doc fetched from Solr to
update a field.

The partial documents update that Jonatan references also requires
that all the fields be stored.

You're best bet is to go back to your system-of-record for the data
and re-index the whole
document.

Best
Erick

On Wed, Jul 11, 2012 at 11:30 AM, Jonatan Fournier
 wrote:
> On Wed, Jul 11, 2012 at 10:57 AM, Vinicius Carvalho
>  wrote:
>> Hi there.
>>
>> I was checking the faq and found that solr does not support field updates
>> right. So I assume that in order to update a document, one should first
>> retrieve it by its Id and then change the required field and update the doc
>> again. But then I wonder about fields that are indexed and not stored,
>> since the new document that is sent to the index does not have the values,
>> would this mean we will loose them?
>>
>> BTW any chances we see field level updates on 4.0 like elastic search has?
>
> I'm actually also looking a this new feature in 4.0-ALPHA:
>
> http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/
>
> I was wondering where the new xml tags where documented to support
> these "set", "add to multi-value" etc.
>
> --
> jonatan
>
>>
>> Regards
>>
>> --
>> The intuitive mind is a sacred gift and the
>> rational mind is a faithful servant. We have
>> created a society that honors the servant and
>> has forgotten the gift.


Re: Updating documents

2012-07-11 Thread Jonatan Fournier
On Wed, Jul 11, 2012 at 10:57 AM, Vinicius Carvalho
 wrote:
> Hi there.
>
> I was checking the faq and found that solr does not support field updates
> right. So I assume that in order to update a document, one should first
> retrieve it by its Id and then change the required field and update the doc
> again. But then I wonder about fields that are indexed and not stored,
> since the new document that is sent to the index does not have the values,
> would this mean we will loose them?
>
> BTW any chances we see field level updates on 4.0 like elastic search has?

I'm actually also looking a this new feature in 4.0-ALPHA:

http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/

I was wondering where the new xml tags where documented to support
these "set", "add to multi-value" etc.

--
jonatan

>
> Regards
>
> --
> The intuitive mind is a sacred gift and the
> rational mind is a faithful servant. We have
> created a society that honors the servant and
> has forgotten the gift.


Re: Relative performance of updating documents of different sizes

2011-08-30 Thread Markus Jelsma
Document size should not have any impact on deleting document as they are only 
marked for deletion. 

On Tuesday 30 August 2011 17:06:05 Jeff Leedy wrote:
> I was curious to know if anyone has any information about the relative
> performance of document updates (delete/add operations) on documents
> of different sizes. I have a use case in which I can either create
> large Solr documents first and subsequently add a small amount of
> information to them, or do the opposite (add the small doc first, then
> update with the big one.) My guess is that adding smaller ones first
> will be faster, since the time to delete a small document is
> presumably longer than the time to delete a small one.
> 
> Thanks,
> Jeff

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Relative performance of updating documents of different sizes

2011-08-30 Thread Jeff Leedy
I was curious to know if anyone has any information about the relative
performance of document updates (delete/add operations) on documents
of different sizes. I have a use case in which I can either create
large Solr documents first and subsequently add a small amount of
information to them, or do the opposite (add the small doc first, then
update with the big one.) My guess is that adding smaller ones first
will be faster, since the time to delete a small document is
presumably longer than the time to delete a small one.

Thanks,
Jeff


Re: updating documents while keeping unspecified fields

2011-07-07 Thread Juan Grande
Hi Adeel,

As far as I know, this isn't possible yet, but some work is being done:

https://issues.apache.org/jira/browse/SOLR-139
https://issues.apache.org/jira/browse/SOLR-828

Regards,

*Juan*



On Thu, Jul 7, 2011 at 2:24 PM, Adeel Qureshi wrote:

> What I am trying to do is to update a document information while keeping
> data for the fields that arent being specified in the update.
>
> So e.g. if this is the schema
>
> 
> 123
> some title
> active
> 
>
> if i send
>
> 
> 123
> closed
> 
>
> it should update the status to be closed for this document but not wipe out
> title since it wasnt provided in the updated data. Is that possible by
> using
> some flags or something ???
>
> Thanks
> Adeel
>


updating documents while keeping unspecified fields

2011-07-07 Thread Adeel Qureshi
What I am trying to do is to update a document information while keeping
data for the fields that arent being specified in the update.

So e.g. if this is the schema


123
some title
active


if i send


123
closed


it should update the status to be closed for this document but not wipe out
title since it wasnt provided in the updated data. Is that possible by using
some flags or something ???

Thanks
Adeel


Re: updating documents in solr 1.3.0

2008-10-16 Thread Bill Au
This is being worked on for Solr 1.4:

https://issues.apache.org/jira/browse/SOLR-139

Bill

On Wed, Oct 15, 2008 at 7:47 PM, Walter Underwood <[EMAIL PROTECTED]>wrote:

> Neither Solr no Lucene support partial updates. "Update" means
> "add or replace". --wunder
>
> On 10/15/08 4:23 PM, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote:
>
> > Hi,
> >   I've been trying to find a way to post partial updates, updating
> only
> > some of the fields in a set of records, via POSTed XML messages to a solr
> > 1.3.0 index. In the wiki (http://wiki.apache.org/solr/UpdateXmlMessages
> ),
> > it almost seems like there's a special  root tag which isn't
> > mentioned anywhere else. Am I correct in assuming that no such 
> tag
> > exists?
> >
> > Thanks in advance,
> >
> > Evan Kelsey
> >
> >
> > http://www.mintel.com
> > providing insight + impact
> >
> > Chicago Office:
> > Mintel International Group Ltd (Mintel)
> > 351 Hubbard Street, Floor 8
> > Chicago, IL 60610
> > USA
> >
> > Tel: 312 932 0400
> > Fax: 312 932 0469
> >
> > London Office:
> > Mintel International Group Ltd (Mintel)
> > 18-19 Long Lane
> > London
> > EC1A 9PL
> > UK
> >
> > Tel: 020 7606 4533
> > Fax: 020 7606 5932
> >
> >
> > Notice
> > 
> > This email may contain information that is privileged,
> > confidential or otherwise protected from disclosure. It
> > must not be used by, or its contents copied or disclosed
> > to, persons other than the addressee. If you have received
> > this email in error please notify the sender immediately
> > and delete the email. Any views or opinions expressed in
> > this message are solely those of the author, and do not
> > necessarily reflect those of Mintel.
> >
> > No Mintel staff are authorised to make purchases using
> > email or over the internet, and any contracts so performed
> > are invalid.
> >
> > Warning
> > **
> > It is the responsibility of the recipient to ensure that
> > the onward transmission, opening or use of this message
> > and any attachments will not adversely affect their systems
> > or data. Please carry out such virus and other checks, as
> > you consider appropriate.
> >
>
>


Re: updating documents in solr 1.3.0

2008-10-15 Thread Walter Underwood
Neither Solr no Lucene support partial updates. "Update" means
"add or replace". --wunder

On 10/15/08 4:23 PM, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote:

> Hi,
>   I've been trying to find a way to post partial updates, updating only
> some of the fields in a set of records, via POSTed XML messages to a solr
> 1.3.0 index. In the wiki (http://wiki.apache.org/solr/UpdateXmlMessages),
> it almost seems like there's a special  root tag which isn't
> mentioned anywhere else. Am I correct in assuming that no such  tag
> exists?
> 
> Thanks in advance,
> 
> Evan Kelsey
> 
> 
> http://www.mintel.com
> providing insight + impact
> 
> Chicago Office:
> Mintel International Group Ltd (Mintel)
> 351 Hubbard Street, Floor 8
> Chicago, IL 60610
> USA
> 
> Tel: 312 932 0400
> Fax: 312 932 0469
> 
> London Office:
> Mintel International Group Ltd (Mintel)
> 18-19 Long Lane
> London
> EC1A 9PL
> UK
> 
> Tel: 020 7606 4533
> Fax: 020 7606 5932
> 
> 
> Notice
> 
> This email may contain information that is privileged,
> confidential or otherwise protected from disclosure. It
> must not be used by, or its contents copied or disclosed
> to, persons other than the addressee. If you have received
> this email in error please notify the sender immediately
> and delete the email. Any views or opinions expressed in
> this message are solely those of the author, and do not
> necessarily reflect those of Mintel.
> 
> No Mintel staff are authorised to make purchases using
> email or over the internet, and any contracts so performed
> are invalid.
> 
> Warning
> **
> It is the responsibility of the recipient to ensure that
> the onward transmission, opening or use of this message
> and any attachments will not adversely affect their systems
> or data. Please carry out such virus and other checks, as
> you consider appropriate.
> 



updating documents in solr 1.3.0

2008-10-15 Thread EKelsey
Hi,
  I've been trying to find a way to post partial updates, updating only
some of the fields in a set of records, via POSTed XML messages to a solr
1.3.0 index. In the wiki (http://wiki.apache.org/solr/UpdateXmlMessages),
it almost seems like there's a special  root tag which isn't
mentioned anywhere else. Am I correct in assuming that no such  tag
exists?

Thanks in advance,

Evan Kelsey


http://www.mintel.com
providing insight + impact

Chicago Office:
Mintel International Group Ltd (Mintel)
351 Hubbard Street, Floor 8
Chicago, IL 60610
USA

Tel: 312 932 0400
Fax: 312 932 0469

London Office:
Mintel International Group Ltd (Mintel)
18-19 Long Lane
London
EC1A 9PL
UK

Tel: 020 7606 4533
Fax: 020 7606 5932


Notice

This email may contain information that is privileged,
confidential or otherwise protected from disclosure. It
must not be used by, or its contents copied or disclosed
to, persons other than the addressee. If you have received
this email in error please notify the sender immediately
and delete the email. Any views or opinions expressed in
this message are solely those of the author, and do not
necessarily reflect those of Mintel.

No Mintel staff are authorised to make purchases using
email or over the internet, and any contracts so performed
are invalid.

Warning
**
It is the responsibility of the recipient to ensure that
the onward transmission, opening or use of this message
and any attachments will not adversely affect their systems
or data. Please carry out such virus and other checks, as
you consider appropriate.



Updating documents in Solr

2008-04-17 Thread nutchvf

Hi!
There are any option to update a field (or a set of fields) of a document
indexed in Solr,without having to update all the fields of the entire
document???
I have seen the SOLR-139 patch,but  I do not know what is the proper syntax
of the command (or the xml to post) to update the document.Please,I hope any
suggestion!!!

For example,something like this:


SOLR1000
9



Regards..
-- 
View this message in context: 
http://www.nabble.com/Updating-documents-in-Solr-tp16742850p16742850.html
Sent from the Solr - User mailing list archive at Nabble.com.