Re: Mark document as hidden

2013-03-18 Thread lboutros
Thanks Jack.

I finally managed to replicate the external files with my own replication
handler.

But now, there's an issue with Solr in the Update Log replay process.

The default processor chain is not used, this means that my processor which
manage the external files is not used...

I have created a Jira issue for this:

https://issues.apache.org/jira/browse/SOLR-4608

Ludovic.



-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Mark-document-as-hidden-tp4045756p4048622.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Mark document as hidden

2013-03-17 Thread Jack Krupansky
Another technique to get around the Zookeeper size issue is to split a 
larger file into smaller pieces and then combine them in you code that moves 
them to the index. OTOH, compression might be a better approach.


In any case, it sounds like it is worth a Jira to propose that a better 
solution is needed to support EFF in SolrCloud.


-- Jack Krupansky

-Original Message- 
From: lboutros

Sent: Sunday, March 17, 2013 10:13 AM
To: solr-user@lucene.apache.org
Subject: Re: Mark document as hidden

Oh, I see :) I did not catch well what you said.

Well, my index could contain 80 millions of elements and a big amount of
them could be hidden.
As you already said, I don't think that ZooKeeper is the right place to
store these files, they are too big.

Thank you again, that gave me some ideas I will try to experiment.

Ludovic.



-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Mark-document-as-hidden-tp4045756p4048222.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Mark document as hidden

2013-03-17 Thread lboutros
Oh, I see :) I did not catch well what you said.

Well, my index could contain 80 millions of elements and a big amount of
them could be hidden.
As you already said, I don't think that ZooKeeper is the right place to
store these files, they are too big.

Thank you again, that gave me some ideas I will try to experiment.

Ludovic.



-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Mark-document-as-hidden-tp4045756p4048222.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Mark document as hidden

2013-03-17 Thread Jack Krupansky
Uh, no, I never suggested that "All files in the index directory are 
replicated". I simply said that if you created your file as a configuration 
file and let Zookeeper propagate it as a configuration file, THEN YOU could 
write a handler/component which would copy from the configuration directory 
to the index directory.


One issue with even that would be that Zookeeper has a very restrictive 
limit on file size - 1 MB. You could compress files or reconfigure Zookeeper 
to us a larger size (jute.maxbuffer), but that is not considered a "safe" 
option.


-- Jack Krupansky

-Original Message- 
From: lboutros

Sent: Sunday, March 17, 2013 7:25 AM
To: solr-user@lucene.apache.org
Subject: Re: Mark document as hidden

Thanks Jack for your answers.

All files in the index directory are replicated ? I thought that only the
lucene index files were replicated.
If you are right, that's great, because I could create an ExternalFileField
type which could get its input file from the index directory and not from
the data directory.

But sadly, in the replication handler there's this code:

/Collection files = new HashSet(*commit.getFileNames()*);/

Therefore I think that this is not the case currently.

Ludovic.



-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Mark-document-as-hidden-tp4045756p4048205.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Mark document as hidden

2013-03-17 Thread lboutros
Thanks Jack for your answers.

All files in the index directory are replicated ? I thought that only the
lucene index files were replicated.
If you are right, that's great, because I could create an ExternalFileField
type which could get its input file from the index directory and not from
the data directory.

But sadly, in the replication handler there's this code:

/Collection files = new HashSet(*commit.getFileNames()*);/

Therefore I think that this is not the case currently.

Ludovic.



-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Mark-document-as-hidden-tp4045756p4048205.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Mark document as hidden

2013-03-16 Thread Jack Krupansky

Ah, yes, with SolrCloud... configuration files are kept in Zookeeper:

http://wiki.apache.org/solr/SolrCloud#Getting_your_Configuration_Files_into_ZooKeeper

And, yes, EFF reads from the index directory.

Maybe you could have a custom handler/component that simply copied the EFF 
file(s) from "conf" to the index dir.


-- Jack Krupansky

-Original Message- 
From: lboutros

Sent: Saturday, March 16, 2013 7:05 PM
To: solr-user@lucene.apache.org
Subject: Re: Mark document as hidden

Hi Jack,

the external files involved in External File Fields are not stored in the
configuration directory and cannot be replicated this way, furthermore in
Solr Cloud, additional files are not replicated anymore.

There is something like that in the code:

  / if (confFileNameAlias.size() < 1 ||
core.getCoreDescriptor().getCoreContainer().isZooKeeperAware())
 *return;*
   LOG.debug("Adding config files to list: " + includeConfFiles);
   //if configuration files need to be included get their details
   rsp.add(CONF_FILES, getConfFileInfoFromCache(confFileNameAlias,
confFileInfoCache));/

Am I wrong ?

Ludovic.



-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Mark-document-as-hidden-tp4045756p4048128.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Mark document as hidden

2013-03-16 Thread lboutros
Hi Jack,

the external files involved in External File Fields are not stored in the
configuration directory and cannot be replicated this way, furthermore in
Solr Cloud, additional files are not replicated anymore.

There is something like that in the code:

   / if (confFileNameAlias.size() < 1 ||
core.getCoreDescriptor().getCoreContainer().isZooKeeperAware())
  *return;*
LOG.debug("Adding config files to list: " + includeConfFiles);
//if configuration files need to be included get their details
rsp.add(CONF_FILES, getConfFileInfoFromCache(confFileNameAlias,
confFileInfoCache));/

Am I wrong ?

Ludovic.



-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Mark-document-as-hidden-tp4045756p4048128.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Mark document as hidden

2013-03-16 Thread Jack Krupansky
The /replication handler in solrconfig.xml has a commented-out "master" 
section which has a "confFiles" element which specifies which configuration 
files to replicate:


schema.xml,stopwords.txt

You can add your external file to that comma-separated list.

-- Jack Krupansky

-Original Message- 
From: lboutros

Sent: Saturday, March 16, 2013 6:46 PM
To: solr-user@lucene.apache.org
Subject: Re: Mark document as hidden

Ok, I have created a processor which manages to update the external file.

Basically,

until a commit request, the hidden document IDs are stored in a Set and when
a commit is requested, a new file is created by copying the last one, then
the additional IDs are appended to the external file.

Now I have a problem in my tests, when the "ChaosMonkey" stops one of the
testing cores and if the Peer Sync is not possible during the recovery
process:

The replication does not replicate the external file.
Do I have to create my own replication handler or is there a way to force
the replication of these files ?

Ludovic.



-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Mark-document-as-hidden-tp4045756p4048125.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Mark document as hidden

2013-03-16 Thread lboutros
Ok, I have created a processor which manages to update the external file.

Basically, 

until a commit request, the hidden document IDs are stored in a Set and when
a commit is requested, a new file is created by copying the last one, then
the additional IDs are appended to the external file.

Now I have a problem in my tests, when the "ChaosMonkey" stops one of the
testing cores and if the Peer Sync is not possible during the recovery
process: 

The replication does not replicate the external file.
Do I have to create my own replication handler or is there a way to force
the replication of these files ?

Ludovic.



-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Mark-document-as-hidden-tp4045756p4048125.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Mark document as hidden

2013-03-10 Thread Erik Hatcher
Seems like that technique would work, as long as the file is saved and flushed 
before the actual commit occurs.

Erik

On Mar 8, 2013, at 12:17 , lboutros wrote:

> I could create an UpdateRequestProcessorFactory that could update this file,
> it seems to be better ?
> 
> 
> 
> -
> Jouve
> France.
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Mark-document-as-hidden-tp4045756p4045842.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Mark document as hidden

2013-03-08 Thread lboutros
I could create an UpdateRequestProcessorFactory that could update this file,
it seems to be better ?



-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Mark-document-as-hidden-tp4045756p4045842.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Mark document as hidden

2013-03-08 Thread lboutros
Ok, thanks Erik.

Do you see any problem in modifying the Update handler in order to append
some  values to this file ?

Ludovic



-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Mark-document-as-hidden-tp4045756p4045839.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Mark document as hidden

2013-03-08 Thread Erik Hatcher
The external file is maintained externally.  Solr only reads it, and does not 
have a facility to write to it, if that is what you're asking.  

Erik

On Mar 8, 2013, at 10:43 , lboutros wrote:

> One more question, is there already a way to update the external file (add
> values) in Solr ?
> 
> Ludovic.
> 
> 
> 
> -
> Jouve
> France.
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Mark-document-as-hidden-tp4045756p4045823.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Mark document as hidden

2013-03-08 Thread Erik Hatcher
Ludovic -

Yes, this query would be cached (unless you say cache=false).  

Erik

On Mar 8, 2013, at 10:26 , lboutros wrote:

> Excellent Erik ! It works perfectly.
> 
> "Normal" filter queries are cached. Is it the same for frange filter queries
> like this one ? :
> 
> fq={!frange l=0 u=10}removed_revision
> 
> Thanks to both for your answers.
> 
> Ludovic.
> 
> 
> 
> -
> Jouve
> France.
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Mark-document-as-hidden-tp4045756p4045817.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Mark document as hidden

2013-03-08 Thread lboutros
One more question, is there already a way to update the external file (add
values) in Solr ?

Ludovic.



-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Mark-document-as-hidden-tp4045756p4045823.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Mark document as hidden

2013-03-08 Thread lboutros
Excellent Erik ! It works perfectly.

"Normal" filter queries are cached. Is it the same for frange filter queries
like this one ? :

fq={!frange l=0 u=10}removed_revision

Thanks to both for your answers.

Ludovic.



-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Mark-document-as-hidden-tp4045756p4045817.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Mark document as hidden

2013-03-08 Thread Erik Hatcher
External file fields, via function queries, are still usable for filtering.  
Consider using the frange function query to filter out hidden documents. 

Erik

On Mar 8, 2013, at 6:40, lboutros  wrote:

> Dear all,
> 
> I would like to mark documents as hidden.
> I could add a field "hidden" and pass the value to "true", but the whole
> documents will be reindexed. 
> And External file fields are not searchable.
> I could store the document keys in an external database and filter the
> result with these ids. But if I have some millions of hidden documents, I
> don't think it is a great idea.
> 
> Currently I will reindex the documents, but if someone has a better idea,
> any help will be appreciated.
> 
> Ludovic.
> 
> 
> 
> -
> Jouve
> France.
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Mark-document-as-hidden-tp4045756.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Mark document as hidden

2013-03-08 Thread Upayavira
Without java coding, you cannot filter on things that aren't in your
index. You would need to re-index the document, but maybe you could make
use of atomic updates to just change the hidden field without needing to
push the whole document again.

Upayavira

On Fri, Mar 8, 2013, at 11:40 AM, lboutros wrote:
> Dear all,
> 
> I would like to mark documents as hidden.
> I could add a field "hidden" and pass the value to "true", but the whole
> documents will be reindexed. 
> And External file fields are not searchable.
> I could store the document keys in an external database and filter the
> result with these ids. But if I have some millions of hidden documents, I
> don't think it is a great idea.
> 
> Currently I will reindex the documents, but if someone has a better idea,
> any help will be appreciated.
> 
> Ludovic.
> 
> 
> 
> -
> Jouve
> France.
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Mark-document-as-hidden-tp4045756.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Mark document as hidden

2013-03-08 Thread lboutros
Dear all,

I would like to mark documents as hidden.
I could add a field "hidden" and pass the value to "true", but the whole
documents will be reindexed. 
And External file fields are not searchable.
I could store the document keys in an external database and filter the
result with these ids. But if I have some millions of hidden documents, I
don't think it is a great idea.

Currently I will reindex the documents, but if someone has a better idea,
any help will be appreciated.

Ludovic.



-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Mark-document-as-hidden-tp4045756.html
Sent from the Solr - User mailing list archive at Nabble.com.