Re: Mark document as hidden
Thanks Jack. I finally managed to replicate the external files with my own replication handler. But now, there's an issue with Solr in the Update Log replay process. The default processor chain is not used, this means that my processor which manage the external files is not used... I have created a Jira issue for this: https://issues.apache.org/jira/browse/SOLR-4608 Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/Mark-document-as-hidden-tp4045756p4048622.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Mark document as hidden
Another technique to get around the Zookeeper size issue is to split a larger file into smaller pieces and then combine them in you code that moves them to the index. OTOH, compression might be a better approach. In any case, it sounds like it is worth a Jira to propose that a better solution is needed to support EFF in SolrCloud. -- Jack Krupansky -Original Message- From: lboutros Sent: Sunday, March 17, 2013 10:13 AM To: solr-user@lucene.apache.org Subject: Re: Mark document as hidden Oh, I see :) I did not catch well what you said. Well, my index could contain 80 millions of elements and a big amount of them could be hidden. As you already said, I don't think that ZooKeeper is the right place to store these files, they are too big. Thank you again, that gave me some ideas I will try to experiment. Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/Mark-document-as-hidden-tp4045756p4048222.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Mark document as hidden
Oh, I see :) I did not catch well what you said. Well, my index could contain 80 millions of elements and a big amount of them could be hidden. As you already said, I don't think that ZooKeeper is the right place to store these files, they are too big. Thank you again, that gave me some ideas I will try to experiment. Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/Mark-document-as-hidden-tp4045756p4048222.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Mark document as hidden
Uh, no, I never suggested that "All files in the index directory are replicated". I simply said that if you created your file as a configuration file and let Zookeeper propagate it as a configuration file, THEN YOU could write a handler/component which would copy from the configuration directory to the index directory. One issue with even that would be that Zookeeper has a very restrictive limit on file size - 1 MB. You could compress files or reconfigure Zookeeper to us a larger size (jute.maxbuffer), but that is not considered a "safe" option. -- Jack Krupansky -Original Message- From: lboutros Sent: Sunday, March 17, 2013 7:25 AM To: solr-user@lucene.apache.org Subject: Re: Mark document as hidden Thanks Jack for your answers. All files in the index directory are replicated ? I thought that only the lucene index files were replicated. If you are right, that's great, because I could create an ExternalFileField type which could get its input file from the index directory and not from the data directory. But sadly, in the replication handler there's this code: /Collection files = new HashSet(*commit.getFileNames()*);/ Therefore I think that this is not the case currently. Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/Mark-document-as-hidden-tp4045756p4048205.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Mark document as hidden
Thanks Jack for your answers. All files in the index directory are replicated ? I thought that only the lucene index files were replicated. If you are right, that's great, because I could create an ExternalFileField type which could get its input file from the index directory and not from the data directory. But sadly, in the replication handler there's this code: /Collection files = new HashSet(*commit.getFileNames()*);/ Therefore I think that this is not the case currently. Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/Mark-document-as-hidden-tp4045756p4048205.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Mark document as hidden
Ah, yes, with SolrCloud... configuration files are kept in Zookeeper: http://wiki.apache.org/solr/SolrCloud#Getting_your_Configuration_Files_into_ZooKeeper And, yes, EFF reads from the index directory. Maybe you could have a custom handler/component that simply copied the EFF file(s) from "conf" to the index dir. -- Jack Krupansky -Original Message- From: lboutros Sent: Saturday, March 16, 2013 7:05 PM To: solr-user@lucene.apache.org Subject: Re: Mark document as hidden Hi Jack, the external files involved in External File Fields are not stored in the configuration directory and cannot be replicated this way, furthermore in Solr Cloud, additional files are not replicated anymore. There is something like that in the code: / if (confFileNameAlias.size() < 1 || core.getCoreDescriptor().getCoreContainer().isZooKeeperAware()) *return;* LOG.debug("Adding config files to list: " + includeConfFiles); //if configuration files need to be included get their details rsp.add(CONF_FILES, getConfFileInfoFromCache(confFileNameAlias, confFileInfoCache));/ Am I wrong ? Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/Mark-document-as-hidden-tp4045756p4048128.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Mark document as hidden
Hi Jack, the external files involved in External File Fields are not stored in the configuration directory and cannot be replicated this way, furthermore in Solr Cloud, additional files are not replicated anymore. There is something like that in the code: / if (confFileNameAlias.size() < 1 || core.getCoreDescriptor().getCoreContainer().isZooKeeperAware()) *return;* LOG.debug("Adding config files to list: " + includeConfFiles); //if configuration files need to be included get their details rsp.add(CONF_FILES, getConfFileInfoFromCache(confFileNameAlias, confFileInfoCache));/ Am I wrong ? Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/Mark-document-as-hidden-tp4045756p4048128.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Mark document as hidden
The /replication handler in solrconfig.xml has a commented-out "master" section which has a "confFiles" element which specifies which configuration files to replicate: schema.xml,stopwords.txt You can add your external file to that comma-separated list. -- Jack Krupansky -Original Message- From: lboutros Sent: Saturday, March 16, 2013 6:46 PM To: solr-user@lucene.apache.org Subject: Re: Mark document as hidden Ok, I have created a processor which manages to update the external file. Basically, until a commit request, the hidden document IDs are stored in a Set and when a commit is requested, a new file is created by copying the last one, then the additional IDs are appended to the external file. Now I have a problem in my tests, when the "ChaosMonkey" stops one of the testing cores and if the Peer Sync is not possible during the recovery process: The replication does not replicate the external file. Do I have to create my own replication handler or is there a way to force the replication of these files ? Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/Mark-document-as-hidden-tp4045756p4048125.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Mark document as hidden
Ok, I have created a processor which manages to update the external file. Basically, until a commit request, the hidden document IDs are stored in a Set and when a commit is requested, a new file is created by copying the last one, then the additional IDs are appended to the external file. Now I have a problem in my tests, when the "ChaosMonkey" stops one of the testing cores and if the Peer Sync is not possible during the recovery process: The replication does not replicate the external file. Do I have to create my own replication handler or is there a way to force the replication of these files ? Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/Mark-document-as-hidden-tp4045756p4048125.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Mark document as hidden
Seems like that technique would work, as long as the file is saved and flushed before the actual commit occurs. Erik On Mar 8, 2013, at 12:17 , lboutros wrote: > I could create an UpdateRequestProcessorFactory that could update this file, > it seems to be better ? > > > > - > Jouve > France. > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Mark-document-as-hidden-tp4045756p4045842.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Mark document as hidden
I could create an UpdateRequestProcessorFactory that could update this file, it seems to be better ? - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/Mark-document-as-hidden-tp4045756p4045842.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Mark document as hidden
Ok, thanks Erik. Do you see any problem in modifying the Update handler in order to append some values to this file ? Ludovic - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/Mark-document-as-hidden-tp4045756p4045839.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Mark document as hidden
The external file is maintained externally. Solr only reads it, and does not have a facility to write to it, if that is what you're asking. Erik On Mar 8, 2013, at 10:43 , lboutros wrote: > One more question, is there already a way to update the external file (add > values) in Solr ? > > Ludovic. > > > > - > Jouve > France. > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Mark-document-as-hidden-tp4045756p4045823.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Mark document as hidden
Ludovic - Yes, this query would be cached (unless you say cache=false). Erik On Mar 8, 2013, at 10:26 , lboutros wrote: > Excellent Erik ! It works perfectly. > > "Normal" filter queries are cached. Is it the same for frange filter queries > like this one ? : > > fq={!frange l=0 u=10}removed_revision > > Thanks to both for your answers. > > Ludovic. > > > > - > Jouve > France. > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Mark-document-as-hidden-tp4045756p4045817.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Mark document as hidden
One more question, is there already a way to update the external file (add values) in Solr ? Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/Mark-document-as-hidden-tp4045756p4045823.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Mark document as hidden
Excellent Erik ! It works perfectly. "Normal" filter queries are cached. Is it the same for frange filter queries like this one ? : fq={!frange l=0 u=10}removed_revision Thanks to both for your answers. Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/Mark-document-as-hidden-tp4045756p4045817.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Mark document as hidden
External file fields, via function queries, are still usable for filtering. Consider using the frange function query to filter out hidden documents. Erik On Mar 8, 2013, at 6:40, lboutros wrote: > Dear all, > > I would like to mark documents as hidden. > I could add a field "hidden" and pass the value to "true", but the whole > documents will be reindexed. > And External file fields are not searchable. > I could store the document keys in an external database and filter the > result with these ids. But if I have some millions of hidden documents, I > don't think it is a great idea. > > Currently I will reindex the documents, but if someone has a better idea, > any help will be appreciated. > > Ludovic. > > > > - > Jouve > France. > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Mark-document-as-hidden-tp4045756.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Mark document as hidden
Without java coding, you cannot filter on things that aren't in your index. You would need to re-index the document, but maybe you could make use of atomic updates to just change the hidden field without needing to push the whole document again. Upayavira On Fri, Mar 8, 2013, at 11:40 AM, lboutros wrote: > Dear all, > > I would like to mark documents as hidden. > I could add a field "hidden" and pass the value to "true", but the whole > documents will be reindexed. > And External file fields are not searchable. > I could store the document keys in an external database and filter the > result with these ids. But if I have some millions of hidden documents, I > don't think it is a great idea. > > Currently I will reindex the documents, but if someone has a better idea, > any help will be appreciated. > > Ludovic. > > > > - > Jouve > France. > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Mark-document-as-hidden-tp4045756.html > Sent from the Solr - User mailing list archive at Nabble.com.
Mark document as hidden
Dear all, I would like to mark documents as hidden. I could add a field "hidden" and pass the value to "true", but the whole documents will be reindexed. And External file fields are not searchable. I could store the document keys in an external database and filter the result with these ids. But if I have some millions of hidden documents, I don't think it is a great idea. Currently I will reindex the documents, but if someone has a better idea, any help will be appreciated. Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/Mark-document-as-hidden-tp4045756.html Sent from the Solr - User mailing list archive at Nabble.com.