[jira] Updated: (SOLR-658) Allow Solr to load index from arbitrary directory in dataDir and Commit point

2008-10-08 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-658:
---

Attachment: SOLR-658.patch

Thanks Akshay.

Updated patch which calls getNewIndexDir before calling IndexReader#reopen so 
that if the new index directory is different from the old index directory, we 
always create a new SolrIndexSearcher with the new index directory.

I'd like to commit this in the next two or three days if there are no 
objections.

> Allow Solr to load index from arbitrary directory in dataDir and Commit point
> -
>
> Key: SOLR-658
> URL: https://issues.apache.org/jira/browse/SOLR-658
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.4
>Reporter: Noble Paul
>Assignee: Shalin Shekhar Mangar
> Fix For: 1.4
>
> Attachments: SOLR-658.patch, SOLR-658.patch, SOLR-658.patch, 
> SOLR-658.patch
>
>
> This is a requirement for java based Solr replication 
> Usecase for arbitrary index directory:
> if the slave has a corrupted index and the filesystem does not allow 
> overwriting files in use (NTFS) replication will fail. The solution is to 
> copy the index from master to an alternate directory on slave and load 
> indexreader/indexwriter from this alternate directory.
> Usecase for arbitrary commitpoint :
> Replication can also provide rollback feature . The rollback should be able 
> to mention a comitpoint /generation so that rollback is possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [jira] Updated: (SOLR-84) New Solr logo?

2008-10-08 Thread Shalin Shekhar Mangar
I think what Noble meant to say is that the Apache lying below Solr does not
looking very good. Perhaps we can shift Apache either left or upwards of
Solr?

On Wed, Oct 8, 2008 at 10:36 AM, Lukáš Vlček <[EMAIL PROTECTED]> wrote:

> It seems so, according to official requiremetns:
> http://wiki.apache.org/solr/LogoContest
>
> On Wed, Oct 8, 2008 at 6:44 AM, Noble Paul നോബിള്‍ नोब्ळ् <
> [EMAIL PROTECTED]> wrote:
>
> > do we really need the APACHE under the solr logo? the other one looks
> clean
> >
> > On Wed, Oct 8, 2008 at 4:22 AM, Lukas Vlcek (JIRA) <[EMAIL PROTECTED]>
> > wrote:
> > >
> > > [
> >
> https://issues.apache.org/jira/browse/SOLR-84?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
> ]
> > >
> > > Lukas Vlcek updated SOLR-84:
> > > 
> > >
> > >Attachment: solr_logo_it_is_burning.png
> > >
> > > It is burning! ... Apache Solr Logo contest submition (based on my
> > previous draft http://picasaweb.google.cz/lukas.vlcek/Solr)
> > >
> > >> New Solr logo?
> > >> --
> > >>
> > >> Key: SOLR-84
> > >> URL: https://issues.apache.org/jira/browse/SOLR-84
> > >> Project: Solr
> > >>  Issue Type: Improvement
> > >>Reporter: Bertrand Delacretaz
> > >>Priority: Minor
> > >> Attachments: logo-grid.jpg, logo-solr-d.jpg, logo-solr-e.jpg,
> > logo-solr-source-files-take2.zip, solr-84-source-files.zip, solr-f.jpg,
> > solr-logo-20061214.jpg, solr-logo-20061218.JPG, solr-logo-20070124.JPG,
> > solr-nick.gif, solr.jpg, solr.s1.jpg, solr.svg,
> solr_logo_it_is_burning.png,
> > sslogo-solr-flare.jpg, sslogo-solr.jpg, sslogo-solr2-flare.jpg,
> > sslogo-solr2.jpg, sslogo-solr3.jpg
> > >>
> > >>
> > >> Following up on SOLR-76, our trainee Nicolas Barbay (nicolas (put at
> > here) sarraux-dessous.ch) has reworked his logo proposal to be more
> "solar".
> > >> This can either be the start of a logo contest, or if people like it
> we
> > could adopt it. The gradients can make it a bit hard to integrate, not
> sure
> > if this is really a problem.
> > >> WDYT?
> > >
> > > --
> > > This message is automatically generated by JIRA.
> > > -
> > > You can reply to this email to add a comment to the issue online.
> > >
> > >
> >
> >
> >
> > --
> > --Noble Paul
> >
>
>
>
> --
> http://blog.lukas-vlcek.com/
>



-- 
Regards,
Shalin Shekhar Mangar.


[jira] Commented: (SOLR-799) Add support for hash based exact/near duplicate document handling

2008-10-08 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637965#action_12637965
 ] 

Grant Ingersoll commented on SOLR-799:
--

Haven't looked at the patch, but I agree that it is wise to separate the 
detection of duplication from the handling of found duplicates.  The default 
can be to remove all as in the patch, but it should be easy to override.  
Scenarios I can see being useful:
1. Prevent new insert
2. Remove old (i.e. same as an update works now)
3.  Note the duplicate on the existing document in a "duplicates" field.  This 
obviously requires either deleting and re-adding the doc, or Lucene to better 
support appending/updating fields, maybe via the column-stride payloads (if 
that ever happens).  No need for this anytime soon.


> Add support for hash based exact/near duplicate document handling
> -
>
> Key: SOLR-799
> URL: https://issues.apache.org/jira/browse/SOLR-799
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Reporter: Mark Miller
>Priority: Minor
> Attachments: SOLR-799.patch
>
>
> Hash based duplicate document detection is efficient and allows for blocking 
> as well as field collapsing. Lets put it into solr. 
> http://wiki.apache.org/solr/Deduplication

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [jira] Updated: (SOLR-84) New Solr logo?

2008-10-08 Thread Noble Paul നോബിള്‍ नोब्ळ्
Adding apache just adds to the no:of letters . So the logo was a bit
big.  draft1 is cool
--Noble

On Wed, Oct 8, 2008 at 2:01 PM, Shalin Shekhar Mangar
<[EMAIL PROTECTED]> wrote:
> I think what Noble meant to say is that the Apache lying below Solr does not
> looking very good. Perhaps we can shift Apache either left or upwards of
> Solr?
>
> On Wed, Oct 8, 2008 at 10:36 AM, Lukáš Vlček <[EMAIL PROTECTED]> wrote:
>
>> It seems so, according to official requiremetns:
>> http://wiki.apache.org/solr/LogoContest
>>
>> On Wed, Oct 8, 2008 at 6:44 AM, Noble Paul നോബിള്‍ नोब्ळ् <
>> [EMAIL PROTECTED]> wrote:
>>
>> > do we really need the APACHE under the solr logo? the other one looks
>> clean
>> >
>> > On Wed, Oct 8, 2008 at 4:22 AM, Lukas Vlcek (JIRA) <[EMAIL PROTECTED]>
>> > wrote:
>> > >
>> > > [
>> >
>> https://issues.apache.org/jira/browse/SOLR-84?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>> ]
>> > >
>> > > Lukas Vlcek updated SOLR-84:
>> > > 
>> > >
>> > >Attachment: solr_logo_it_is_burning.png
>> > >
>> > > It is burning! ... Apache Solr Logo contest submition (based on my
>> > previous draft http://picasaweb.google.cz/lukas.vlcek/Solr)
>> > >
>> > >> New Solr logo?
>> > >> --
>> > >>
>> > >> Key: SOLR-84
>> > >> URL: https://issues.apache.org/jira/browse/SOLR-84
>> > >> Project: Solr
>> > >>  Issue Type: Improvement
>> > >>Reporter: Bertrand Delacretaz
>> > >>Priority: Minor
>> > >> Attachments: logo-grid.jpg, logo-solr-d.jpg, logo-solr-e.jpg,
>> > logo-solr-source-files-take2.zip, solr-84-source-files.zip, solr-f.jpg,
>> > solr-logo-20061214.jpg, solr-logo-20061218.JPG, solr-logo-20070124.JPG,
>> > solr-nick.gif, solr.jpg, solr.s1.jpg, solr.svg,
>> solr_logo_it_is_burning.png,
>> > sslogo-solr-flare.jpg, sslogo-solr.jpg, sslogo-solr2-flare.jpg,
>> > sslogo-solr2.jpg, sslogo-solr3.jpg
>> > >>
>> > >>
>> > >> Following up on SOLR-76, our trainee Nicolas Barbay (nicolas (put at
>> > here) sarraux-dessous.ch) has reworked his logo proposal to be more
>> "solar".
>> > >> This can either be the start of a logo contest, or if people like it
>> we
>> > could adopt it. The gradients can make it a bit hard to integrate, not
>> sure
>> > if this is really a problem.
>> > >> WDYT?
>> > >
>> > > --
>> > > This message is automatically generated by JIRA.
>> > > -
>> > > You can reply to this email to add a comment to the issue online.
>> > >
>> > >
>> >
>> >
>> >
>> > --
>> > --Noble Paul
>> >
>>
>>
>>
>> --
>> http://blog.lukas-vlcek.com/
>>
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



-- 
--Noble Paul


Re: EmbeddedSolrServer API usage

2008-10-08 Thread Grant Ingersoll
I get an NPE in create when trying this.  Seems the loader member is  
not instantiated for the container.create() call, since load is not  
called.



On Oct 2, 2008, at 10:37 AM, Ryan McKinley wrote:

You could also use the CoreContainer to create a Core from the  
descriptor:


   CoreContainer container = new CoreContainer();
   CoreDescriptor descriptor = new CoreDescriptor(container,
   "core1", "/Users/erik/apache-solr-1.3.0/example/solr");
   SolrCore core = container.create( descriptor );

if you are using a custom solrconfig name, you would need to call  
setConfigName( path ) on the descriptor.


As for closing...  have you tried core.close()?

ryan


On Oct 2, 2008, at 8:49 AM, Erik Hatcher wrote:

I'm doing some Java experiments to get ready for a solr-ruby  
overhaul such that JRuby comes into play nicely so that  
EmbeddedSolrServer can be used transparently too.  I've not tried  
this since the whole CoreContainer/CoreDescriptor stuff was added,  
and I don't quite understand it all.  Here's what I've got:


public static void main(String[] args) throws IOException,  
ParserConfigurationException, SAXException, SolrServerException {

  CoreContainer container = new CoreContainer();
  SolrConfig config = new SolrConfig("/Users/erik/apache-solr-1.3.0/ 
example/solr", "solrconfig.xml", null);
  CoreDescriptor descriptor = new CoreDescriptor(container,  
"core1", "/Users/erik/apache-solr-1.3.0/example/solr");
  SolrCore core = new SolrCore("core1", "/Users/erik/apache- 
solr-1.3.0/example/solr/data", config, null, descriptor);

  container.register("core1", core, false);
  SolrServer solr = new EmbeddedSolrServer(container, "core1");
  SolrQuery query = new SolrQuery("*:*");
  QueryResponse response = solr.query(query);
  System.out.println("response = " + response);
}

This works, but has a fair bit of seemingly unnecessary  
duplication, and it also leaves the JVM stays running for some  
reason.


Is this the proper way to use EmbeddedSolrServer, or are there some  
tips to improving the code and reducing the duplication?


Also, why does the JVM keep running?  Are we spinning off a thread  
that needs to be shut down?  Is there some sort of close() call  
that is needed?


Thanks,
Erik





--
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ










Re: [jira] Updated: (SOLR-84) New Solr logo?

2008-10-08 Thread Lukáš Vlček
Hi,
I am glad you like the draft#1 (and actually I think the second design is
not totally lost, just wipe out the Apache letters and you get it). But the
problem is that the draft#1 (as it is today) would not make it into the
contest due to violation of the strongest requirement:

The logo must incorporate the full project name: Apache Solr

That is the assigment (http://wiki.apache.org/solr/LogoContest).
You can try to push the contest organizers, not me...

If you were to ask me if I like the fact that the Apache word has to be
incorporated then I would tell you that I not happy about it (but this
should not mean that I think that one can not create a perfect design with
the Apache word). The problem I see with this is that there are no official
rules how the Apache word can be used in designs (which type of font, which
color...). Current mix of fonts in my second proposal is not ideal but I am
scared to use any exotic font on Apache because people are used to see
something like Arial Bold and in the end of the day having too exotic design
of Apache could be seen as an disadvantage.

Regards,
Lukas

On Wed, Oct 8, 2008 at 6:17 PM, Noble Paul നോബിള്‍ नोब्ळ् <
[EMAIL PROTECTED]> wrote:

> Adding apache just adds to the no:of letters . So the logo was a bit
> big.  draft1 is cool
> --Noble
>
> On Wed, Oct 8, 2008 at 2:01 PM, Shalin Shekhar Mangar
> <[EMAIL PROTECTED]> wrote:
> > I think what Noble meant to say is that the Apache lying below Solr does
> not
> > looking very good. Perhaps we can shift Apache either left or upwards of
> > Solr?
> >
> > On Wed, Oct 8, 2008 at 10:36 AM, Lukáš Vlček <[EMAIL PROTECTED]>
> wrote:
> >
> >> It seems so, according to official requiremetns:
> >> http://wiki.apache.org/solr/LogoContest
> >>
> >> On Wed, Oct 8, 2008 at 6:44 AM, Noble Paul നോബിള്‍ नोब्ळ् <
> >> [EMAIL PROTECTED]> wrote:
> >>
> >> > do we really need the APACHE under the solr logo? the other one looks
> >> clean
> >> >
> >> > On Wed, Oct 8, 2008 at 4:22 AM, Lukas Vlcek (JIRA) <[EMAIL PROTECTED]>
> >> > wrote:
> >> > >
> >> > > [
> >> >
> >>
> https://issues.apache.org/jira/browse/SOLR-84?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
> >> ]
> >> > >
> >> > > Lukas Vlcek updated SOLR-84:
> >> > > 
> >> > >
> >> > >Attachment: solr_logo_it_is_burning.png
> >> > >
> >> > > It is burning! ... Apache Solr Logo contest submition (based on my
> >> > previous draft http://picasaweb.google.cz/lukas.vlcek/Solr)
> >> > >
> >> > >> New Solr logo?
> >> > >> --
> >> > >>
> >> > >> Key: SOLR-84
> >> > >> URL: https://issues.apache.org/jira/browse/SOLR-84
> >> > >> Project: Solr
> >> > >>  Issue Type: Improvement
> >> > >>Reporter: Bertrand Delacretaz
> >> > >>Priority: Minor
> >> > >> Attachments: logo-grid.jpg, logo-solr-d.jpg,
> logo-solr-e.jpg,
> >> > logo-solr-source-files-take2.zip, solr-84-source-files.zip,
> solr-f.jpg,
> >> > solr-logo-20061214.jpg, solr-logo-20061218.JPG,
> solr-logo-20070124.JPG,
> >> > solr-nick.gif, solr.jpg, solr.s1.jpg, solr.svg,
> >> solr_logo_it_is_burning.png,
> >> > sslogo-solr-flare.jpg, sslogo-solr.jpg, sslogo-solr2-flare.jpg,
> >> > sslogo-solr2.jpg, sslogo-solr3.jpg
> >> > >>
> >> > >>
> >> > >> Following up on SOLR-76, our trainee Nicolas Barbay (nicolas (put
> at
> >> > here) sarraux-dessous.ch) has reworked his logo proposal to be more
> >> "solar".
> >> > >> This can either be the start of a logo contest, or if people like
> it
> >> we
> >> > could adopt it. The gradients can make it a bit hard to integrate, not
> >> sure
> >> > if this is really a problem.
> >> > >> WDYT?
> >> > >
> >> > > --
> >> > > This message is automatically generated by JIRA.
> >> > > -
> >> > > You can reply to this email to add a comment to the issue online.
> >> > >
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > --Noble Paul
> >> >
> >>
> >>
> >>
> >> --
> >> http://blog.lukas-vlcek.com/
> >>
> >
> >
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
> >
>
>
>
> --
> --Noble Paul
>



-- 
http://blog.lukas-vlcek.com/


[jira] Commented: (SOLR-799) Add support for hash based exact/near duplicate document handling

2008-10-08 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637976#action_12637976
 ] 

Yonik Seeley commented on SOLR-799:
---

bq. I agree that it is wise to separate the detection of duplication from the 
handling of found duplicates

Though in some implementations (like #2, which may be the default), detecting 
that duplicate and handling it are truly coupled... forcing a decoupling would 
not be a good thing in that case.


> Add support for hash based exact/near duplicate document handling
> -
>
> Key: SOLR-799
> URL: https://issues.apache.org/jira/browse/SOLR-799
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Reporter: Mark Miller
>Priority: Minor
> Attachments: SOLR-799.patch
>
>
> Hash based duplicate document detection is efficient and allows for blocking 
> as well as field collapsing. Lets put it into solr. 
> http://wiki.apache.org/solr/Deduplication

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [jira] Updated: (SOLR-84) New Solr logo?

2008-10-08 Thread Andrzej Bialecki

Lukáš Vlček wrote:

Hi,
I am glad you like the draft#1 (and actually I think the second design is
not totally lost, just wipe out the Apache letters and you get it). But the
problem is that the draft#1 (as it is today) would not make it into the
contest due to violation of the strongest requirement:

The logo must incorporate the full project name: Apache Solr

That is the assigment (http://wiki.apache.org/solr/LogoContest).
You can try to push the contest organizers, not me...


How about a layout like this one (hopefully the ascii art makes it 
through email ...):


,--.  A p a c h e
|__   \"|"/  +   ,-+
   | -  O  - |   |_/
"--'  /,|.\  +-- | \


--
Best regards,
Andrzej Bialecki <><
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



[jira] Commented: (SOLR-799) Add support for hash based exact/near duplicate document handling

2008-10-08 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638009#action_12638009
 ] 

Yonik Seeley commented on SOLR-799:
---

Some thoughts...

- How should different "types" be handled (for example when we support binary 
fields).  For example, different base64 encoders might use different line 
lengths or different line endings (CR/LF).  Perhaps it's good enough to say 
that the string form must be identical, and leave it at that for now?  The 
alternative would be signatures based on the Lucene Document about to be 
indexed.

- It would be nice to be able to calculate a signature for a document w/o 
having to catenate all the fields together.
Perhaps change calculate(String content) to something like 
calculate(Iterable content)?

An alternative option would be incremental hashing...
{code}
Signature sig = ourSignatureCreator.create();
sig.add(f1)
sig.add(f2)
sig.add(f3)
String s = sig.getSignature()
{code}

Looking at how TextProfileSignature works, i'd lean toward incremental hashing 
to avoid building yet another big string. Having a hashing object also opens up 
the possibility to easily add other method signatures for more efficient 
hashing.

- It appears that if you put fields in a different order that the signature 
will change

- It appears that documents with different field names but the same content 
will have the same signature.

- I don't understand the dedup logic in DUH2... it seems like we want to delete 
by id and by sig... unfortunately there is no 
  IndexWriter.updateDocument(Term[] terms, Document doc) so we'll have to do a 
separate non-atomic delete on the sig for now, right?

- There's probably no need for a separate test solrconfig-deduplicate.xml if 
all it adds is an update processor.  Tests could just explicitly specify the 
update handler on updates.


> Add support for hash based exact/near duplicate document handling
> -
>
> Key: SOLR-799
> URL: https://issues.apache.org/jira/browse/SOLR-799
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Reporter: Mark Miller
>Priority: Minor
> Attachments: SOLR-799.patch
>
>
> Hash based duplicate document detection is efficient and allows for blocking 
> as well as field collapsing. Lets put it into solr. 
> http://wiki.apache.org/solr/Deduplication

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-799) Add support for hash based exact/near duplicate document handling

2008-10-08 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638048#action_12638048
 ] 

Mark Miller commented on SOLR-799:
--

bq.I agree that it is wise to separate the detection of duplication from 
the handling of found duplicates

bq. Though in some implementations (like #2, which may be the default), 
detecting that duplicate and handling it are truly coupled... forcing a 
decoupling would not be a good thing in that case.

Still looking at this. Was hoping to avoid any of the old 'if solr crashes you 
can have 2 docs with same id in the index' type stuff. Guess I won't easily get 
away with that  Hopefully we can make it so the default implementation can 
still be as efficient and atomic.

bq. How should different "types" be handled (for example when we support binary 
fields). For example, different base64 encoders might use different line 
lengths or different line endings (CR/LF). Perhaps it's good enough to say that 
the string form must be identical, and leave it at that for now? The 
alternative would be signatures based on the Lucene Document about to be 
indexed.

Yeah, may be best to worry about it when we support binary fields...would be 
nice to look forward though. I think returning a byte[] rather than a String 
will future proof the sig implementations a bit along those lines (though 
doesn't address your point)...still mulling - this shouldn't trip up Fuzzy 
hashing implementations to much, and so how exact should MD5Signature be...

bq. *  It appears that if you put fields in a different order that the 
signature will change
bq. * It appears that documents with different field names but the same 
content will have the same signature.

Two good points I have addressed.

bq. It would be nice to be able to calculate a signature for a document w/o 
having to catenate all the fields together.
Perhaps change calculate(String content) to something like 
calculate(Iterable content)?

I like the idea of incremental as well.

bq. I don't understand the dedup logic in DUH2... it seems like we want to 
delete by id and by sig... unfortunately there is no
IndexWriter.updateDocument(Term[] terms, Document doc) so we'll have to do a 
separate non-atomic delete on the sig for now, right?

Another one I was hoping to get away with. My current strategy was to say that 
setting an update term means that updating by id is overridden and *only* the 
update Term is used - effectively, the update Term (signature) becomes the 
update id - and you can control whether the id factors into that update 
signature or not.  Didn't get passes the goalie I suppose  I guess I give up 
on clean atomic imp and perhaps investigate update(terms[], doc) for the 
future. I wanted to deal with both signature and id, but figured its best to 
start with most efficient bare bones and work out.

bq. There's probably no need for a separate test solrconfig-deduplicate.xml if 
all it adds is an update processor. Tests could just explicitly specify the 
update handler on updates.

Its mainly for me at the moment (testing config settings loading and what not), 
I'll be sure to pull it once the patch is done.

Thanks for all of the feedback.


> Add support for hash based exact/near duplicate document handling
> -
>
> Key: SOLR-799
> URL: https://issues.apache.org/jira/browse/SOLR-799
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Reporter: Mark Miller
>Priority: Minor
> Attachments: SOLR-799.patch
>
>
> Hash based duplicate document detection is efficient and allows for blocking 
> as well as field collapsing. Lets put it into solr. 
> http://wiki.apache.org/solr/Deduplication

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: EmbeddedSolrServer API usage

2008-10-08 Thread Grant Ingersoll

So, the answer to my own ? is:

I was getting:
java.lang.NullPointerException
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:315)
	at  
com.grantingersoll.noodles.EmbeddedTest.testEmbedded(EmbeddedTest.java: 
23)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at  
sun 
.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java: 
39)
	at  
sun 
.reflect 
.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java: 
25)
	at  
com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:40)


For the code:
CoreContainer container = new CoreContainer();
CoreDescriptor descriptor = new CoreDescriptor(container,  
"spell", "src/main/resources/solr");

SolrCore core = container.create(descriptor);
final SolrServer client = new EmbeddedSolrServer(container,  
"spell");

assertTrue("client is null and it shouldn't be", client != null);

This is b/c CoreContainer contains the following code:
File idir = new File(dcore.getInstanceDir());
if (!idir.isAbsolute()) {
  idir = new File(loader.getInstanceDir(), dcore.getInstanceDir());
}

The problem is loader is not initialized yet, b/c load isn't called.   
Switching to an absolute path is a workaround, but I suppose this is a  
bug.


-Grant

On Oct 8, 2008, at 12:27 PM, Grant Ingersoll wrote:

I get an NPE in create when trying this.  Seems the loader member is  
not instantiated for the container.create() call, since load is not  
called.



On Oct 2, 2008, at 10:37 AM, Ryan McKinley wrote:

You could also use the CoreContainer to create a Core from the  
descriptor:


  CoreContainer container = new CoreContainer();
  CoreDescriptor descriptor = new CoreDescriptor(container,
  "core1", "/Users/erik/apache-solr-1.3.0/example/solr");
  SolrCore core = container.create( descriptor );

if you are using a custom solrconfig name, you would need to call  
setConfigName( path ) on the descriptor.


As for closing...  have you tried core.close()?

ryan


On Oct 2, 2008, at 8:49 AM, Erik Hatcher wrote:

I'm doing some Java experiments to get ready for a solr-ruby  
overhaul such that JRuby comes into play nicely so that  
EmbeddedSolrServer can be used transparently too.  I've not tried  
this since the whole CoreContainer/CoreDescriptor stuff was added,  
and I don't quite understand it all.  Here's what I've got:


public static void main(String[] args) throws IOException,  
ParserConfigurationException, SAXException, SolrServerException {

 CoreContainer container = new CoreContainer();
 SolrConfig config = new SolrConfig("/Users/erik/apache-solr-1.3.0/ 
example/solr", "solrconfig.xml", null);
 CoreDescriptor descriptor = new CoreDescriptor(container,  
"core1", "/Users/erik/apache-solr-1.3.0/example/solr");
 SolrCore core = new SolrCore("core1", "/Users/erik/apache- 
solr-1.3.0/example/solr/data", config, null, descriptor);

 container.register("core1", core, false);
 SolrServer solr = new EmbeddedSolrServer(container, "core1");
 SolrQuery query = new SolrQuery("*:*");
 QueryResponse response = solr.query(query);
 System.out.println("response = " + response);
}

This works, but has a fair bit of seemingly unnecessary  
duplication, and it also leaves the JVM stays running for some  
reason.


Is this the proper way to use EmbeddedSolrServer, or are there  
some tips to improving the code and reducing the duplication?


Also, why does the JVM keep running?  Are we spinning off a thread  
that needs to be shut down?  Is there some sort of close() call  
that is needed?


Thanks,
Erik





[jira] Assigned: (SOLR-721) DirectSolrConnection is broken - missing CoreContainer initialization

2008-10-08 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley reassigned SOLR-721:
--

Assignee: Ryan McKinley

> DirectSolrConnection is broken - missing CoreContainer initialization
> -
>
> Key: SOLR-721
> URL: https://issues.apache.org/jira/browse/SOLR-721
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 1.3
>Reporter: Henri Biestro
>Assignee: Ryan McKinley
> Fix For: 1.4
>
>
> It might be initialized in such a way that no core container is created.
> Adding the proper includes & a member:
> {code}
> final CoreContainer cores;
> {code}
> And modifying the constructor:
> {code}
>   public DirectSolrConnection( String instanceDir, String dataDir, String 
> loggingPath )
>   {
> // If a loggingPath is specified, try using that (this needs to happen 
> first)
> if( loggingPath != null ) {
>   File loggingConfig = new File( loggingPath );
>   if( !loggingConfig.exists() && instanceDir != null ) {
> loggingConfig = new File( new File(instanceDir), loggingPath  );
>   }
>   if( loggingConfig.exists() ) {
> System.setProperty("java.util.logging.config.file", 
> loggingConfig.getAbsolutePath() ); 
>   }
>   else {
> throw new SolrException( SolrException.ErrorCode.SERVER_ERROR, "can 
> not find logging file: "+loggingConfig );
>   }
> }
> 
> // Initialize CoreContainer
> try {
>   cores = new CoreContainer(new 
> SolrResourceLoader(SolrResourceLoader.locateInstanceDir()));
>   SolrConfig solrConfig = new SolrConfig(instanceDir, 
> SolrConfig.DEFAULT_CONF_FILE, null);
>   CoreDescriptor dcore = new CoreDescriptor(cores, "", 
> solrConfig.getResourceLoader().getInstanceDir());
>   IndexSchema indexSchema = new IndexSchema(solrConfig, 
> instanceDir+"/conf/schema.xml", null);
>   core = new SolrCore( null, dataDir, solrConfig, indexSchema, dcore);
>   cores.register("", core, false);
>   parser = new SolrRequestParsers( config );
> } 
> catch (Exception ee) {
>   throw new RuntimeException(ee);
> }
>   }
> {code}
> Should take care of this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-721) DirectSolrConnection is broken - missing CoreContainer initialization

2008-10-08 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley resolved SOLR-721.


Resolution: Fixed

fixed in trunk...
thanks Henri!

> DirectSolrConnection is broken - missing CoreContainer initialization
> -
>
> Key: SOLR-721
> URL: https://issues.apache.org/jira/browse/SOLR-721
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 1.3
>Reporter: Henri Biestro
>Assignee: Ryan McKinley
> Fix For: 1.4
>
>
> It might be initialized in such a way that no core container is created.
> Adding the proper includes & a member:
> {code}
> final CoreContainer cores;
> {code}
> And modifying the constructor:
> {code}
>   public DirectSolrConnection( String instanceDir, String dataDir, String 
> loggingPath )
>   {
> // If a loggingPath is specified, try using that (this needs to happen 
> first)
> if( loggingPath != null ) {
>   File loggingConfig = new File( loggingPath );
>   if( !loggingConfig.exists() && instanceDir != null ) {
> loggingConfig = new File( new File(instanceDir), loggingPath  );
>   }
>   if( loggingConfig.exists() ) {
> System.setProperty("java.util.logging.config.file", 
> loggingConfig.getAbsolutePath() ); 
>   }
>   else {
> throw new SolrException( SolrException.ErrorCode.SERVER_ERROR, "can 
> not find logging file: "+loggingConfig );
>   }
> }
> 
> // Initialize CoreContainer
> try {
>   cores = new CoreContainer(new 
> SolrResourceLoader(SolrResourceLoader.locateInstanceDir()));
>   SolrConfig solrConfig = new SolrConfig(instanceDir, 
> SolrConfig.DEFAULT_CONF_FILE, null);
>   CoreDescriptor dcore = new CoreDescriptor(cores, "", 
> solrConfig.getResourceLoader().getInstanceDir());
>   IndexSchema indexSchema = new IndexSchema(solrConfig, 
> instanceDir+"/conf/schema.xml", null);
>   core = new SolrCore( null, dataDir, solrConfig, indexSchema, dcore);
>   cores.register("", core, false);
>   parser = new SolrRequestParsers( config );
> } 
> catch (Exception ee) {
>   throw new RuntimeException(ee);
> }
>   }
> {code}
> Should take care of this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (SOLR-736) SolrCore.getSolrCore() may create a SolrCore without a CoreContainer

2008-10-08 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley reassigned SOLR-736:
--

Assignee: Ryan McKinley

> SolrCore.getSolrCore() may create a SolrCore without a CoreContainer
> 
>
> Key: SOLR-736
> URL: https://issues.apache.org/jira/browse/SOLR-736
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 1.3
>Reporter: Henri Biestro
>Assignee: Ryan McKinley
>
> The method is deprecated but one can still initialize & start working this 
> way.
> Potential fix could be:
> {code}
>   @Deprecated
>   public static SolrCore getSolrCore() {
> synchronized( SolrCore.class ) {
>   if( instance == null ) {
> try {
>   // sets 'instance' to the latest solr core
>   CoreContainer.Initializer init = new CoreContainer.Initializer();
>   instance = init.initialize().getCore("");
> } catch(Exception xany) {
>   throw new SolrException( SolrException.ErrorCode.SERVER_ERROR,
>   "error creating core", xany );
> }
>   }
> }
> return instance;
>   }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-736) SolrCore.getSolrCore() may create a SolrCore without a CoreContainer

2008-10-08 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley resolved SOLR-736.


   Resolution: Fixed
Fix Version/s: 1.4

thanks Henri!

> SolrCore.getSolrCore() may create a SolrCore without a CoreContainer
> 
>
> Key: SOLR-736
> URL: https://issues.apache.org/jira/browse/SOLR-736
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 1.3
>Reporter: Henri Biestro
>Assignee: Ryan McKinley
> Fix For: 1.4
>
>
> The method is deprecated but one can still initialize & start working this 
> way.
> Potential fix could be:
> {code}
>   @Deprecated
>   public static SolrCore getSolrCore() {
> synchronized( SolrCore.class ) {
>   if( instance == null ) {
> try {
>   // sets 'instance' to the latest solr core
>   CoreContainer.Initializer init = new CoreContainer.Initializer();
>   instance = init.initialize().getCore("");
> } catch(Exception xany) {
>   throw new SolrException( SolrException.ErrorCode.SERVER_ERROR,
>   "error creating core", xany );
> }
>   }
> }
> return instance;
>   }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-805) DisMax queries are not being cached in QueryResultCache

2008-10-08 Thread Todd Feak (JIRA)
DisMax queries are not being cached in QueryResultCache
---

 Key: SOLR-805
 URL: https://issues.apache.org/jira/browse/SOLR-805
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 1.3
 Environment: Using Sun JDK 1.5 and Solr 1.3.0 release on Windows XP
Reporter: Todd Feak
Priority: Critical


I have a DisMax Search Handler set up in my solrconfig.xml to weight results 
based on which field a hit was found in. Results seem to be coming back fine, 
but the exact same query issued twice will *not* result in a cache hit.

I have run far enough in the debugger to determine that the hashCode for the 
BooleanQuery object is returning a different value each time for the same 
query. This leads me to believe there is some random factor involved in it's 
calculation, such as a default Object hashCode() implementation somewhere in 
the chain. Non DisMax queries seem to be caching just fine.

Where I see this behavior exhibited is on line 47 of the QueryResultKey 
constructor. I have not dug in far enough to determine exactly where the 
hashCode is being incorrectly calculated. I will try and dig in further 
tomorrow, but wanted to get some attention on the bug. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-805) DisMax queries are not being cached in QueryResultCache

2008-10-08 Thread Todd Feak (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Feak updated SOLR-805:
---

Comment: was deleted

> DisMax queries are not being cached in QueryResultCache
> ---
>
> Key: SOLR-805
> URL: https://issues.apache.org/jira/browse/SOLR-805
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 1.3
> Environment: Using Sun JDK 1.5 and Solr 1.3.0 release on Windows XP
>Reporter: Todd Feak
>Priority: Critical
>
> I have a DisMax Search Handler set up in my solrconfig.xml to weight results 
> based on which field a hit was found in. Results seem to be coming back fine, 
> but the exact same query issued twice will *not* result in a cache hit.
> I have run far enough in the debugger to determine that the hashCode for the 
> BooleanQuery object is returning a different value each time for the same 
> query. This leads me to believe there is some random factor involved in it's 
> calculation, such as a default Object hashCode() implementation somewhere in 
> the chain. Non DisMax queries seem to be caching just fine.
> Where I see this behavior exhibited is on line 47 of the QueryResultKey 
> constructor. I have not dug in far enough to determine exactly where the 
> hashCode is being incorrectly calculated. I will try and dig in further 
> tomorrow, but wanted to get some attention on the bug. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-486) Support binary formats for QueryresponseWriter

2008-10-08 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-486:


Attachment: optimizemap.patch

Just the way NamedList keys can be externalized, Map keys can also be 
externalized.And this is backward compatible.

Maps are not used very commonly in SOLR. but SOLR-561 uses maps for 
master-slave communication



> Support binary formats for QueryresponseWriter
> --
>
> Key: SOLR-486
> URL: https://issues.apache.org/jira/browse/SOLR-486
> Project: Solr
>  Issue Type: Improvement
>  Components: clients - java, search
>Reporter: Noble Paul
>Assignee: Yonik Seeley
> Fix For: 1.3
>
> Attachments: optimizemap.patch, SOLR-486-iterator.patch, 
> SOLR-486-iterator.patch, SOLR-486.patch, solr-486.patch, SOLR-486.patch, 
> SOLR-486.patch, SOLR-486.patch, SOLR-486.patch, SOLR-486.patch, 
> SOLR-486.patch, SOLR-486.patch, SOLR-486.patch
>
>
> QueryResponse writer only allows text data to be written.
> So it is not possible to implement a binary protocol . Create another 
> interface which has a method 
> write(OutputStream os, SolrQueryRequest request, SolrQueryResponse response)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.