Re: How to manage resource out of index?

2010-07-06 Thread Rebecca Watson
hi li,

i looked at doing something similar - where we only index the text
but retrieve search results / highlight from files -- we ended up giving
up because of the amount of customisation required in solr -- mainly
because we wanted the distributed search functionality in solr which
meant making
sure the original file ended up the same filing system i.e. machine too!).

we ended up just storing the main text field too even though there was a
bit of text -- in the end solr/lucene can handle the index size fine and
disk space is cheaper than man-hours to customise solr/lucene to work
in this way!

that was our conclusion anyway and it works fine -- we also have
separate index / search server(s) so we don't care about merge time
either -- and as i said above - we use the distributed search so don't tend
to need to merge very large indexes anyway.
when your system grows / you go into production you'll probably split
the indexes too to use solr's distributed search func. for the sake of
query speed).

hope that helps,

bec :)

On 7 July 2010 14:07, Li Li  wrote:
> I used to store full text into lucene index. But I found it's very
> slow when merging index because when merging 2 segments it copy the
> fdt files into a new one. So I want to only index full text. But When
> searching I need the full text for applications such as hightlight and
> view full text. I can store the full text by  pair in
> database and load it to memory. And When I search in lucene(or solr),
> I retrive url of doc first, then use url to get full text. But when
> they are stored separately, it is hard to managed. They may be not
> consistent with each other. Does lucene or solr provied any method to
> ease this problem? Or any one  has some experience of this problem?
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


How to manage resource out of index?

2010-07-06 Thread Li Li
I used to store full text into lucene index. But I found it's very
slow when merging index because when merging 2 segments it copy the
fdt files into a new one. So I want to only index full text. But When
searching I need the full text for applications such as hightlight and
view full text. I can store the full text by  pair in
database and load it to memory. And When I search in lucene(or solr),
I retrive url of doc first, then use url to get full text. But when
they are stored separately, it is hard to managed. They may be not
consistent with each other. Does lucene or solr provied any method to
ease this problem? Or any one  has some experience of this problem?


Re: document level security: indexing/searching techniques

2010-07-06 Thread Glen Newton
You could implement a good solution with the underlying Lucene ParallelReader
http://lucene.apache.org/java/3_0_2/api/core/org/apache/lucene/index/ParallelReader.html
Keep the 100 search fields - 'static' info - in one index, the
permissions info in another index that gets updated when the
permissions change.

Does SOLR expose this kind of functionality?

-Glen Newton
http://zzzoot.blogspot.com/
http://zzzoot.blogspot.com/2009/07/project-torngat-building-large-scale.html

On 7 July 2010 00:38, RL  wrote:
>
> I've a question about indexing/searching techniques in relation to document
> level security.
> In planning a system that has, let's say, about 1million search documents
> with about 100 search fields each. Most of them unstored to keep the index
> size low, because some of them can contain some kilobytes and some of them
> several hundred kilobytes. Two of these search fields are for permission
> checking, where i keep the explicitely allowed and explicitely disallowed
> users and usergroups. (usergroups can be in a hierarchical structure with
> permission inheritance)
>
> So when a user searches in the system, his user id, and ids of usergroup
> memberships are added as a filter query in my application logic before the
> query is sent to solr. So far so good for the searching part.
>
> But the problem is, that the permissions can be changed by administrators of
> that system, requiring to re-index the two permission search fields.
>
> first idea:
> Partial updates of index entries is not possible, so i need to fetch all the
> 1million documents from a database to do a re-indexing just because some
> permissions changed. The fetching process is rather expensive and requires
> more then 14hours. I am sure that this can be optimized of course, but i
> would rather try to avoid a whole re-indexing of all content.
>
> second idea:
> Another idea would be to store just the permissions in one small and fast to
> update index and all the other stuff in the other huge and not so often
> updated index. But i didn't find any possibilities to combine these two
> indices in one query. Is that even possible?
>
>
> Does somebody have experience with these topics or give advice how to solve
> that case properly?
> Thanks in advance.
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/document-level-security-indexing-searching-techniques-tp946528p946528.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 

-


Re: document level security: indexing/searching techniques

2010-07-06 Thread Lance Norskog
What Ken describes is called 'role-based' security. Users have roles,
and security items talk about roles, not users.

http://en.wikipedia.org/wiki/Role-based_access_control

On Tue, Jul 6, 2010 at 3:15 PM, Peter Sturge  wrote:
> Yes, you don't want to hard code permissions into your index - it will give
> you headaches.
>
> You might want to have a look at SOLR 1872:
> https://issues.apache.org/jira/browse/SOLR-1872 .
> This patch provides doc level security through an external ACL mechanism (in
> this case, an XML file) controlling a filter query,
> This way, you don't need to change the schema - you can even use existing
> indexes, and you can change access control without affecting your stored
> data.
>
> HTH,
> Peter
>
>
> On Tue, Jul 6, 2010 at 5:16 PM, Ken Krugler 
> wrote:
>
>>
>> On Jul 6, 2010, at 8:27am, osocurious2 wrote:
>>
>>
>>> Someone else was recently asking a similar question (or maybe it was you
>>> but
>>> worded differently :) ).
>>>
>>> Putting user level security at a document level seems like a recipe for
>>> pain. Solr/Lucene don't do frequent update well...and being highly
>>> optimized
>>> for query, I don't blame them. Is there any way to create a series of
>>> roles
>>> that you can apply to your documents? If the security level of the
>>> document
>>> isn't changing, just the user access to them, give the docs a role in the
>>> index, put your user/usergroup stuff in a DB or some other system and
>>> resolve your user into valid roles, then FilterQuery on role.
>>>
>>
>> You're right, baking in too fine-grained a level of security information is
>> a bad idea.
>>
>> As one example that worked pretty well for code search with Krugle, we set
>> access control on a per project level using LDAP groups - ie each project
>> had some number of groups that were granted access rights. Each file in the
>> project would inherit the same list of groups.
>>
>> Then, when a user logs in they get authenticated via LDAP, and we have the
>> set of groups they belong to being returned by the LDAP server. This then
>> becomes a fairly well-bounded list of "terms" for an OR query against the
>> "acl-groups" field in each file/project document. Just don't forget to set
>> the boost to 0 for that portion of the query :)
>>
>> -- Ken
>>
>> 
>> Ken Krugler
>> +1 530-210-6378
>> http://bixolabs.com
>> e l a s t i c   w e b   m i n i n g
>>
>>
>>
>>
>>
>



-- 
Lance Norskog
goks...@gmail.com


Re: general debugging techniques?

2010-07-06 Thread Lance Norskog
Ah! I did not notice the 'too many open files' part. This means that
your mergeFactor setting is too high for what your operating system
allows. The default mergeFactor is 10 (which translates into thousands
of open file descriptors). You should lower this number.

On Tue, Jul 6, 2010 at 1:14 PM, Jim Blomo  wrote:
> On Sat, Jul 3, 2010 at 1:10 PM, Lance Norskog  wrote:
>> You don't need to optimize, only commit.
>
> OK, thanks for the tip, Lance.  I thought the "too many open files"
> problem was because I wasn't optimizing/merging frequently enough.  My
> understanding of your suggestion is that commit also does merging, and
> since I am only building the index, not querying or updating it, I
> don't need to optimize.
>
>> This means that the JVM spends 98% of its time doing garbage
>> collection. This means there is not enough memory.
>
> I'll increase the memory to 4G, decrease the documentCache to 5 and try again.
>
>> I made a mistake - the bug in Lucene is not about PDFs - it happens
>> with every field in every document you index in any way- so doing this
>> in Tika outside Solr does not help. The only trick I can think of is
>> to alternate between indexing large and small documents. This way the
>> bug does not need memory for two giant documents in a row.
>
> I've checked out and built solr from branch_3x with the
> tika-0.8-SNAPSHOT patch.  (Earlier I was having trouble with Tika
> crashing too frequently.)  I've confirmed that LUCENE-2387 is fixed in
> this branch so hopefully I won't run into that this time.
>
>> Also, do not query the indexer at all. If you must, don't do sorted or
>> faceting requests. These eat up a lot of memory that is only freed
>> with the next commit (index reload).
>
> Good to know, though I have not been querying the index and definitely
> haven't ventured into faceted requests yet.
>
> The advice is much appreciated,
>
> Jim
>



-- 
Lance Norskog
goks...@gmail.com


index format error because disk full

2010-07-06 Thread Li Li
the index file is ill-formated because disk full when feeding. Can I
roll back to last version? Is there any method to avoid unexpected
errors when indexing? attachments are my segment_N


Re: Deleting Terms:

2010-07-06 Thread Erick Erickson
That's because deleting a document simply marks it as deleted,
it doesn't really do much else with it, all that work is deferred
to the optimize step as you've found.

But deleted documents will NOT be found even though the
admin page shows their terms still in the index.

Best
Erick

On Tue, Jul 6, 2010 at 1:20 PM, Kumaravel Kandasami <
kumaravel.kandas...@gmail.com> wrote:

> FYI - optimise() operations solved the issue.
>
>
> Kumar_/|\_
> www.saisk.com
> ku...@saisk.com
> "making a profound difference with knowledge and creativity..."
>
>
> On Tue, Jul 6, 2010 at 11:47 AM, Kumaravel Kandasami <
> kumaravel.kandas...@gmail.com> wrote:
>
> > BTW, Using SOLRJ - javabin api.
> >
> >
> >
> > Kumar_/|\_
> > www.saisk.com
> > ku...@saisk.com
> > "making a profound difference with knowledge and creativity..."
> >
> >
> > On Tue, Jul 6, 2010 at 11:43 AM, Kumaravel Kandasami <
> > kumaravel.kandas...@gmail.com> wrote:
> >
> >> Hi,
> >>
> >>How to delete the terms associated with the document ?
> >>
> >> Current scenario: We are deleting documents based on a query
> >> ('field:value').
> >> The documents are getting deleted, however, the old terms associated to
> >> the field are displayed in the admin.
> >>
> >> How do we make SOLR to re-evaluate and update the terms associated to a
> >> specific fields or latest updated document ?
> >>
> >> (I am assuming we are missing some api calls .)
> >>
> >> Thank you.
> >>
> >>
> >> Kumar_/|\_
> >> www.saisk.com
> >> ku...@saisk.com
> >> "making a profound difference with knowledge and creativity..."
> >>
> >
> >
>


Re: Relevancy and non-matching words

2010-07-06 Thread Erick Erickson
Underneath SOLR is Lucene. Here's a description of
Lucene's scoring algorithm (follow the "Similarity" link)
http://lucene.apache.org/java/2_4_0/scoring.html#Understanding%20the%20Scoring%20Formula

Letters in non-matching words isn't relevant, what is
is the relationship between the number of search terms
found and the number of tokens (think of them as words)
in the field.

I'm also assuming you've either set the default operator to
AND or that your default field is "title".

Using &debugQyery=on will show you a lot. you can also
access that information from the admin pages (Full Interface
link or something like that).

HTH
Erick

On Tue, Jul 6, 2010 at 12:17 PM, dbashford  wrote:

>
> Is there some sort of threshold that I can tweak which sets how many
> letters
> in non-matching words makes a result more or less relevant?
>
> Searching on title, q=fantasy football, and I get this:
>
> {"title":"The Fantasy Football Guys",
> "score":2.8387074},
> {"title":"Fantasy Football Bums",
> "score":2.8387074},
> {"title":"Fantasy Football Xtreme",
> "score":2.7019854},
> {"title":"Fantasy Football Fools",
> "score":2.7019634},
> {"title":"Fantasy Football Brothers",
> "score":2.5917912}
>
> (I have some other scoring things in there that account for the difference
> between Xtreme and Fools.)
>
> The behavior I'm noticing is that there is some threshold for the length of
> non matching words that, when tripped, kicks the score down a notch.  4 to
> 5
> seems to trip one, 6 to 7.
>
> I would really like something like "Bums" to score the same as "Xtreme" and
> "Brothers" and let my other criterion determine which document should come
> out on top.  Is there something that can be tweaked to get this to happen?
>
> Or is my assumption a bit off base?
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Relevancy-and-non-matching-words-tp946799p946799.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Adding new elements to index

2010-07-06 Thread Erick Erickson
first do you have a unique key defined in your schema.xml? If you
do, some of those 300 rows could be replacing earlier rows.

You say: " if I have 200
rows indexed from postgres and 100 rows from Oracle, the full-import process
only indexes 200 documents from oracle, although it shows clearly that the
query retruned 300 rows."

Which really looks like a typo, if you have 100 rows from Oracle how
did you get 200 rows from Oracle?

Are you perhaps doing this in two different jobs and deleting the
first import before running the second?

And if this is irrelevant, could you provide more details like how you're
indexing things (I'm assuming DIH, but you don't state that anywhere).
If it *is* DIH, providing that configuration would help.

Best
Erick

On Tue, Jul 6, 2010 at 11:19 AM, Xavier Rodriguez  wrote:

> Hi,
>
> I have a SOLR installed on a Tomcat application server. This solr instance
> has some data indexed from a postgres database. Now I need to add some
> entities from an Oracle database. When I run the full-import command, the
> documents indexed are only documents from postgres. In fact, if I have 200
> rows indexed from postgres and 100 rows from Oracle, the full-import
> process
> only indexes 200 documents from oracle, although it shows clearly that the
> query retruned 300 rows.
>
> I'm not doing a delta-import, simply a full import. I've tried to clean the
> index, reload the configuration, and manually remove dataimport.properties
> because it's the only metadata i found.  Is there any other file to check
> or
> modify just to get all 300 rows indexed?
>
> Of course, I tried to find one of that oracle fields, with no results.
>
> Thanks a lot,
>
> Xavier Rodriguez.
>


Re: Wildcards queries

2010-07-06 Thread Erick Erickson
Still not enough info.

Please show:
1> the field type (not field, but field type showing the analyzers for the
field you're interested in).
2> example data you've indexed
3> the query you submit
4> the response from the query (especially with &debugQuery=on appended to
the query).

Otherwise, it's really hard to guess what's going on.

HTH
Erick

On Tue, Jul 6, 2010 at 9:58 AM, Robert Naczinski <
robert.naczin...@googlemail.com> wrote:

> Hi,
>
> thanks for the reply. I am an absolute beginner with Solr.
>
> I have taken, for the beginning, the configuration from
> {solr.home}example/solr .
>
> In solrconfig.xml are all queryparser commented out  ;-( Where can a
> find the QeryParser? Javadoc, Wiki?
>
> Regards,
>
> Robert
>
> 2010/7/6 Mark Miller :
> > On 7/6/10 8:53 AM, Robert Naczinski wrote:
> >> Hi,
> >>
> >> we use in our application EmbeddedSolrServer.
> >
> > Great!
> >
> >> Everything went fine.
> >
> > Excellent!
> >
> >> Now I want use wildcards queries.
> >
> > Cool!
> >
> >>
> >> It does not work.
> >
> > Bummer!
> >
> >> Must be adapted for the schema.xml?
> >
> > Not necessarily...
> >
> >>
> >> Can someone help me?
> >
> > We can try!
> >
> >>In wiki, I find nothing?
> >
> > No, you will find lots!
> >
> >> Why do I need simple
> >> example or link.
> >
> > Because it would be helpful!
> >
> >
> >>
> >> Regards,
> >>
> >> Robert
> >
> >
> > What query parser are you using? Dismax? That query parser does not
> > support wildcards. Try the lucene queryparser if that's the case.
> >
> > Otherwise respond with more information about your setup.
> >
> > --
> > - Mark
> >
> > http://www.lucidimagination.com
> >
>


Re: Problem building Nightly Solr

2010-07-06 Thread Ken Krugler


On Jul 6, 2010, at 3:44pm, Chris Hostetter wrote:



: Can you try "ant compile example"?
: After Lucene/Solr merge, solr ant build needs to compile before  
example

: target.

the "compile" target is already in the dependency tree for the  
"example"

target, so that won't change anything.

At the moment, the "nightly" snapshots produced by hudson only  
iclude the

"solr" section of the "dev" tree -- not modules or the lucene-java
sections .  The compiled versions of thothat code is included, so  
you can
*run* solr from the hudson artifacts, but aparently you can't  
compile it.

(this is particularly odd since the nightlies include all the compiled
lucene code as jars in a "lucene-libs/" directory, but the build  
system
doesn't seem to use that directory ... at least not when compiling  
solrj).


This is all side effects of trunk still being somewhat in transition  
--
there are kinks in dealing with the artifacts of the nightly build  
process

tha still need worked out, -- but if your goal is to compile things
yourself, then you might as well just check out the entire trunk and  
use

that compile fro mthat anyway.


Note that you'll need to "ant compile" from the top of the lucene  
directory first, before trying any of the solr-specific builds from  
inside of the /solr sub-dir. Or at least that's what I ran into when  
trying to build a solr dist recently.


-- Ken


Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g






Re: Problem building Nightly Solr

2010-07-06 Thread Chris Hostetter

: (this is particularly odd since the nightlies include all the compiled 
: lucene code as jars in a "lucene-libs/" directory, but the build system 
: doesn't seem to use that directory ... at least not when compiling solrj).

https://issues.apache.org/jira/browse/SOLR-1989

-Hoss



Re: Problem building Nightly Solr

2010-07-06 Thread Chris Hostetter

: Can you try "ant compile example"?
: After Lucene/Solr merge, solr ant build needs to compile before example
: target.

the "compile" target is already in the dependency tree for the "example" 
target, so that won't change anything.

At the moment, the "nightly" snapshots produced by hudson only iclude the 
"solr" section of the "dev" tree -- not modules or the lucene-java 
sections .  The compiled versions of thothat code is included, so you can 
*run* solr from the hudson artifacts, but aparently you can't compile it.  
(this is particularly odd since the nightlies include all the compiled 
lucene code as jars in a "lucene-libs/" directory, but the build system 
doesn't seem to use that directory ... at least not when compiling solrj).

This is all side effects of trunk still being somewhat in transition -- 
there are kinks in dealing with the artifacts of the nightly build process 
tha still need worked out, -- but if your goal is to compile things 
yourself, then you might as well just check out the entire trunk and use 
that compile fro mthat anyway.







-Hoss



Re: Unicode processing - Issue with CharStreamAwareWhitespaceTokenizerFactory

2010-07-06 Thread Jan Høydahl / Cominvent
The Char-filters MUST come before the Tokenizer, due to their nature of 
processing the character-stream and not the tokens.

If you need to apply the accent normalizatino later in the analysis chain, 
either use ISOLatin1AccentFilterFactory or help with the implementation of 
SOLR-1978.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Training in Europe - www.solrtraining.com

On 5. juli 2010, at 17.32, Saïd Radhouani wrote:

> Thanks Koji for the reply and for updating wiki. As it's written now in wiki, 
> it sounds (at least to me) like MappingCharFilterFactory works only with 
> WhitespaceTokenizerFactory.
> 
> Did you really mean that? Because this filter  works also with other 
> tkenizers. For instance, in my text type, I'm using StandardTokenizerFactory 
> for document processing, and  WhitespaceTokenizerFactory for query processing.
> 
> I also noticed that, in whatever order you put this filter in the definition 
> of a field type, it's always applied (during text processing) before the 
> tokenizer and all the other filters. Is there a reason for that? Is there a 
> possibility to force the filter to be applied at a certain order among the 
> other filters?
> 
> Thanks,
> -S
> 
> On Jul 5, 2010, at 4:28 PM, Koji Sekiguchi wrote:
> 
>> 
>>> In the same wiki, they say that CharStreamAwareWhitespaceTokenizerFactory 
>>> must be used with MappingCharFilterFactory. But, when I use these tokenizer 
>>> and filter together, I get a sever error saying that the filed type 
>>> containing these filter and tokenizer is unknown. However, when I use this 
>>> filter with StandardTokenizerFactory  or WhitespaceTokenizerFactory!
>>> 
>>> 
>> The wiki is not correct today. Before Lucene 2.9 (and Solr 1.4),
>> Tokenizers can take Reader argument in constructor. But after that,
>> because they can take CharStream argument in constructor,
>> *CharStreamAware* Tokenizers are no longer needed (all Tokenizers
>> are aware of CharStream). I'll update the wiki.
>> 
>> Koji
>> 
>> -- 
>> http://www.rondhuit.com/en/
>> 
> 



Re: document level security: indexing/searching techniques

2010-07-06 Thread Peter Sturge
Yes, you don't want to hard code permissions into your index - it will give
you headaches.

You might want to have a look at SOLR 1872:
https://issues.apache.org/jira/browse/SOLR-1872 .
This patch provides doc level security through an external ACL mechanism (in
this case, an XML file) controlling a filter query,
This way, you don't need to change the schema - you can even use existing
indexes, and you can change access control without affecting your stored
data.

HTH,
Peter


On Tue, Jul 6, 2010 at 5:16 PM, Ken Krugler wrote:

>
> On Jul 6, 2010, at 8:27am, osocurious2 wrote:
>
>
>> Someone else was recently asking a similar question (or maybe it was you
>> but
>> worded differently :) ).
>>
>> Putting user level security at a document level seems like a recipe for
>> pain. Solr/Lucene don't do frequent update well...and being highly
>> optimized
>> for query, I don't blame them. Is there any way to create a series of
>> roles
>> that you can apply to your documents? If the security level of the
>> document
>> isn't changing, just the user access to them, give the docs a role in the
>> index, put your user/usergroup stuff in a DB or some other system and
>> resolve your user into valid roles, then FilterQuery on role.
>>
>
> You're right, baking in too fine-grained a level of security information is
> a bad idea.
>
> As one example that worked pretty well for code search with Krugle, we set
> access control on a per project level using LDAP groups - ie each project
> had some number of groups that were granted access rights. Each file in the
> project would inherit the same list of groups.
>
> Then, when a user logs in they get authenticated via LDAP, and we have the
> set of groups they belong to being returned by the LDAP server. This then
> becomes a fairly well-bounded list of "terms" for an OR query against the
> "acl-groups" field in each file/project document. Just don't forget to set
> the boost to 0 for that portion of the query :)
>
> -- Ken
>
> 
> Ken Krugler
> +1 530-210-6378
> http://bixolabs.com
> e l a s t i c   w e b   m i n i n g
>
>
>
>
>


Re: Problem building Nightly Solr

2010-07-06 Thread Koji Sekiguchi

(10/07/07 6:25), darknovan...@gmail.com wrote:
I'd like to try the new edismax feature in Solr, so I downloaded the 
latest nightly (apache-solr-4.0-2010-07-05_08-06-42) and tried running 
"ant example". It fails with a missing package error. I've pasted in 
the output below. I tried a nightly from a couple weeks ago, and it 
did the same thing, as did the current svn version. Just to make sure 
it wasn'ta problem with my environment, I tried building Solr 1.4.1 
and it worked fine. I'm running java 1.6.0_20 and ant 1.7.1. Is there 
anything I should be doing differently or is this something that needs 
to get fix in the builds? Thanks,


Nick

---

nick:/tmp/apache-solr-4.0-2010-07-05_08-06-42$ ant example
Buildfile: build.xml

init-forrest-entities:

dist-contrib:

init:

init-forrest-entities:

compile-lucene:

compile-solrj:
[javac] Compiling 89 source files to 
/tmp/apache-solr-4.0-2010-07-05_08-06-42/build/solrj
[javac] 
/tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:19: 
package org.apache.lucene.util does not exist

[javac] import org.apache.lucene.util.PriorityQueue;
[javac] ^
[javac] 
/tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:352: 
cannot find symbol

[javac] symbol : class PriorityQueue
[javac] location: class 
org.apache.solr.common.util.ConcurrentLRUCache

[javac] private static class PQueue extends PriorityQueue {
[javac] ^
[javac] 
/tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:319: 
cannot find symbol

[javac] symbol : method size()
[javac] location: class 
org.apache.solr.common.util.ConcurrentLRUCache.PQueue

[javac] while (queue.size() > queue.myMaxSize && queue.size() > 0) {
[javac] ^
[javac] 
/tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:319: 
cannot find symbol

[javac] symbol : method size()
[javac] location: class 
org.apache.solr.common.util.ConcurrentLRUCache.PQueue

[javac] while (queue.size() > queue.myMaxSize && queue.size() > 0) {
[javac] ^
[javac] 
/tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:320: 
cannot find symbol

[javac] symbol : method pop()
[javac] location: class 
org.apache.solr.common.util.ConcurrentLRUCache.PQueue

[javac] CacheEntry otherEntry = (CacheEntry) queue.pop();
[javac] ^
[javac] 
/tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:355: 
non-static variable super cannot be referenced from a static context

[javac] super.initialize(maxSz);
[javac] ^
[javac] 
/tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:355: 
cannot find symbol

[javac] symbol : method initialize(int)
[javac] location: class java.lang.Object
[javac] super.initialize(maxSz);
[javac] ^
[javac] 
/tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:359: 
cannot find symbol

[javac] symbol : variable heap
[javac] location: class 
org.apache.solr.common.util.ConcurrentLRUCache.PQueue

[javac] Object[] getValues() { return heap; }
[javac] ^
[javac] 
/tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:368: 
non-static method size() cannot be referenced from a static context

[javac] if (size() < myMaxSize) {
[javac] ^
[javac] 
/tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:369: 
cannot find symbol

[javac] symbol : method add(java.lang.Object)
[javac] location: class 
org.apache.solr.common.util.ConcurrentLRUCache.PQueue

[javac] add(element);
[javac] ^
[javac] 
/tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:371: 
non-static method size() cannot be referenced from a static context

[javac] } else if (size() > 0 && !lessThan(element, heap[1])) {
[javac] ^
[javac] 
/tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:371: 
cannot find symbol

[javac] symbol : variable heap
[javac] location: class 
org.apache.solr.common.util.ConcurrentLRUCache.PQueue

[javac] } else if (size() > 0 && !lessThan(element, heap[1])) {
[javac] ^
[javac] 
/tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:372: 
cannot find symbol

[javac] symbol : variable heap
[javac] location: class 
org.apache.solr.common.util.ConcurrentLRUCache.PQueue

[javac] Object ret = heap[1];
[javac] ^
[javac] 
/tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:373: 
cannot find symbol

[javac] symbol : variable heap
[javac] location: class 
org.apache.solr.common.util.ConcurrentLRUCache.PQueue

[javac] heap[1] = element;
[javac] ^
[javac] 
/tmp/apache-solr-4.0-2010-07-05_08

Problem building Nightly Solr

2010-07-06 Thread DarkNovaNick
I'd like to try the new edismax feature in Solr, so I downloaded the latest  
nightly (apache-solr-4.0-2010-07-05_08-06-42) and tried running "ant  
example". It fails with a missing package error. I've pasted in the output  
below. I tried a nightly from a couple weeks ago, and it did the same  
thing, as did the current svn version. Just to make sure it wasn'ta problem  
with my environment, I tried building Solr 1.4.1 and it worked fine. I'm  
running java 1.6.0_20 and ant 1.7.1. Is there anything I should be doing  
differently or is this something that needs to get fix in the builds?  
Thanks,


Nick

---

nick:/tmp/apache-solr-4.0-2010-07-05_08-06-42$ ant example
Buildfile: build.xml

init-forrest-entities:

dist-contrib:

init:

init-forrest-entities:

compile-lucene:

compile-solrj:
[javac] Compiling 89 source files to  
/tmp/apache-solr-4.0-2010-07-05_08-06-42/build/solrj
[javac]  
/tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:19:  
package org.apache.lucene.util does not exist

[javac] import org.apache.lucene.util.PriorityQueue;
[javac] ^
[javac]  
/tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:352:  
cannot find symbol

[javac] symbol : class PriorityQueue
[javac] location: class org.apache.solr.common.util.ConcurrentLRUCache
[javac] private static class PQueue extends PriorityQueue {
[javac] ^
[javac]  
/tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:319:  
cannot find symbol

[javac] symbol : method size()
[javac] location: class  
org.apache.solr.common.util.ConcurrentLRUCache.PQueue

[javac] while (queue.size() > queue.myMaxSize && queue.size() > 0) {
[javac] ^
[javac]  
/tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:319:  
cannot find symbol

[javac] symbol : method size()
[javac] location: class  
org.apache.solr.common.util.ConcurrentLRUCache.PQueue

[javac] while (queue.size() > queue.myMaxSize && queue.size() > 0) {
[javac] ^
[javac]  
/tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:320:  
cannot find symbol

[javac] symbol : method pop()
[javac] location: class  
org.apache.solr.common.util.ConcurrentLRUCache.PQueue

[javac] CacheEntry otherEntry = (CacheEntry) queue.pop();
[javac] ^
[javac]  
/tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:355:  
non-static variable super cannot be referenced from a static context

[javac] super.initialize(maxSz);
[javac] ^
[javac]  
/tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:355:  
cannot find symbol

[javac] symbol : method initialize(int)
[javac] location: class java.lang.Object
[javac] super.initialize(maxSz);
[javac] ^
[javac]  
/tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:359:  
cannot find symbol

[javac] symbol : variable heap
[javac] location: class  
org.apache.solr.common.util.ConcurrentLRUCache.PQueue

[javac] Object[] getValues() { return heap; }
[javac] ^
[javac]  
/tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:368:  
non-static method size() cannot be referenced from a static context

[javac] if (size() < myMaxSize) {
[javac] ^
[javac]  
/tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:369:  
cannot find symbol

[javac] symbol : method add(java.lang.Object)
[javac] location: class  
org.apache.solr.common.util.ConcurrentLRUCache.PQueue

[javac] add(element);
[javac] ^
[javac]  
/tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:371:  
non-static method size() cannot be referenced from a static context

[javac] } else if (size() > 0 && !lessThan(element, heap[1])) {
[javac] ^
[javac]  
/tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:371:  
cannot find symbol

[javac] symbol : variable heap
[javac] location: class  
org.apache.solr.common.util.ConcurrentLRUCache.PQueue

[javac] } else if (size() > 0 && !lessThan(element, heap[1])) {
[javac] ^
[javac]  
/tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:372:  
cannot find symbol

[javac] symbol : variable heap
[javac] location: class  
org.apache.solr.common.util.ConcurrentLRUCache.PQueue

[javac] Object ret = heap[1];
[javac] ^
[javac]  
/tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:373:  
cannot find symbol

[javac] symbol : variable heap
[javac] location: class  
org.apache.solr.common.util.ConcurrentLRUCache.PQueue

[javac] heap[1] = element;
[javac] ^
[javac]  
/tmp/apache-solr-4.0-2010-07-05_08-06-

Re: Solr results not updating

2010-07-06 Thread Moazzam Khan
That's exactly what it was. I forgot to commit.

Thanks,

Moazzam

On Tue, Jul 6, 2010 at 3:29 PM, Markus Jelsma  wrote:
> Hi,
>
>
>
> If q=*:* doesn't show your insert, then you forgot the commit:
>
> http://wiki.apache.org/solr/UpdateXmlMessages#A.22commit.22_and_.22optimize.22
>
>
>
> Cheers,
>
>
>
> -Original message-
> From: Moazzam Khan 
> Sent: Tue 06-07-2010 22:09
> To: solr-user@lucene.apache.org;
> Subject: Solr results not updating
>
> Hi,
>
> I just successfully inserted a document into SOlr but when I search
> for it, it doesn't show up. Is it a cache issue or something? Is there
> a way to make sure it was inserted properly? And, it's there?
>
> Thanks,
> Moazzam
>


RE: Solr results not updating

2010-07-06 Thread Markus Jelsma
Hi,

 

If q=*:* doesn't show your insert, then you forgot the commit:

http://wiki.apache.org/solr/UpdateXmlMessages#A.22commit.22_and_.22optimize.22

 

Cheers,


 
-Original message-
From: Moazzam Khan 
Sent: Tue 06-07-2010 22:09
To: solr-user@lucene.apache.org; 
Subject: Solr results not updating

Hi,

I just successfully inserted a document into SOlr but when I search
for it, it doesn't show up. Is it a cache issue or something? Is there
a way to make sure it was inserted properly? And, it's there?

Thanks,
Moazzam


Re: general debugging techniques?

2010-07-06 Thread Jim Blomo
On Sat, Jul 3, 2010 at 1:10 PM, Lance Norskog  wrote:
> You don't need to optimize, only commit.

OK, thanks for the tip, Lance.  I thought the "too many open files"
problem was because I wasn't optimizing/merging frequently enough.  My
understanding of your suggestion is that commit also does merging, and
since I am only building the index, not querying or updating it, I
don't need to optimize.

> This means that the JVM spends 98% of its time doing garbage
> collection. This means there is not enough memory.

I'll increase the memory to 4G, decrease the documentCache to 5 and try again.

> I made a mistake - the bug in Lucene is not about PDFs - it happens
> with every field in every document you index in any way- so doing this
> in Tika outside Solr does not help. The only trick I can think of is
> to alternate between indexing large and small documents. This way the
> bug does not need memory for two giant documents in a row.

I've checked out and built solr from branch_3x with the
tika-0.8-SNAPSHOT patch.  (Earlier I was having trouble with Tika
crashing too frequently.)  I've confirmed that LUCENE-2387 is fixed in
this branch so hopefully I won't run into that this time.

> Also, do not query the indexer at all. If you must, don't do sorted or
> faceting requests. These eat up a lot of memory that is only freed
> with the next commit (index reload).

Good to know, though I have not been querying the index and definitely
haven't ventured into faceted requests yet.

The advice is much appreciated,

Jim


Solr results not updating

2010-07-06 Thread Moazzam Khan
Hi,

I just successfully inserted a document into SOlr but when I search
for it, it doesn't show up. Is it a cache issue or something? Is there
a way to make sure it was inserted properly? And, it's there?

Thanks,
Moazzam


Re: using DataImport Dev Console: no errors, but no documents

2010-07-06 Thread Chris Hostetter

: It fetches 5322 rows but doesn't process any documents and doesn't 
: populate the index.  Any suggestions would be appreciated.

I don't know much about DIH, but it seems weird that both of your entities 
say 'rootEntity="false"'

looking at the docs, that definitely doesn't seem like what you want...

http://wiki.apache.org/solr/DataImportHandler

>> rootEntity : By default the entities falling under the document are 
>> root entities. If it is set to false , the entity directly falling 
>> under that  entity will be treated as the root entity (so on and so 
>> forth). For every  row returned by the root entity a document is 
>> created in Solr 



-Hoss



Re: DatImportHandler and cron issue

2010-07-06 Thread Chris Hostetter
: What we are seeing is the request is dispatched to solr server,but its not
: being processed.

you'll have to explain what you mean by "not being processed" ?

According to your logs, DIH is in fact working and logging it's 
progress...

: 2010-06-14 12:51:01,328 INFO  [org.apache.solr.core.SolrCore]
: (http-0.0.0.0-8080-1) [npmetrosearch_statesman] webapp=/solr
: path=/dataimport
: 
params={site=statesman&forDate=03/24/10&articleTypes=story,slideshow,video,poll,specialArticle,list&clean=false&commit=true&entity=initialLoad&command=full-import&numArticles=-1&server=app5}
: status=0 QTime=0 
: 2010-06-14 12:51:01,329 INFO 
: [org.apache.solr.handler.dataimport.DataImporter] (Thread-378) Starting Full
: Import
: 2010-06-14 12:51:01,332 INFO 
: [org.apache.solr.handler.dataimport.SolrWriter] (Thread-378) Read
: dataimport.properties
: 2010-06-14 12:51:01,425 INFO 
: [org.apache.solr.handler.dataimport.DocBuilder] (Thread-378) Time taken =
: 0:0:0.93


-Hoss



Re: proximity question

2010-07-06 Thread Ahmet Arslan
> Will quotes do an exact match within
> a proximity test? 

No.

> If not, does anybody know how to accomplish this?

It is not supported out-of-the-box. You need to plug Lucene's XmlQueryParser or 
SurroundQueryParser. Similar discussion:
http://search-lucene.com/m/PO3iXKRuAv1/




  


proximity question

2010-07-06 Thread mike anderson
Will quotes do an exact match within a proximity test? For instance

body:""mountain goat" grass"~10

should match:

"the mountain goat went up the hill to eat grass"

but should NOT match

"the mountain where the goat lives is covered in grass"


If not, does anybody know how to accomplish this?


Thanks,
Mike Anderson


Re: Deleting Terms:

2010-07-06 Thread Kumaravel Kandasami
FYI - optimise() operations solved the issue.


Kumar_/|\_
www.saisk.com
ku...@saisk.com
"making a profound difference with knowledge and creativity..."


On Tue, Jul 6, 2010 at 11:47 AM, Kumaravel Kandasami <
kumaravel.kandas...@gmail.com> wrote:

> BTW, Using SOLRJ - javabin api.
>
>
>
> Kumar_/|\_
> www.saisk.com
> ku...@saisk.com
> "making a profound difference with knowledge and creativity..."
>
>
> On Tue, Jul 6, 2010 at 11:43 AM, Kumaravel Kandasami <
> kumaravel.kandas...@gmail.com> wrote:
>
>> Hi,
>>
>>How to delete the terms associated with the document ?
>>
>> Current scenario: We are deleting documents based on a query
>> ('field:value').
>> The documents are getting deleted, however, the old terms associated to
>> the field are displayed in the admin.
>>
>> How do we make SOLR to re-evaluate and update the terms associated to a
>> specific fields or latest updated document ?
>>
>> (I am assuming we are missing some api calls .)
>>
>> Thank you.
>>
>>
>> Kumar_/|\_
>> www.saisk.com
>> ku...@saisk.com
>> "making a profound difference with knowledge and creativity..."
>>
>
>


Re: Deleting Terms:

2010-07-06 Thread Kumaravel Kandasami
BTW, Using SOLRJ - javabin api.


Kumar_/|\_
www.saisk.com
ku...@saisk.com
"making a profound difference with knowledge and creativity..."


On Tue, Jul 6, 2010 at 11:43 AM, Kumaravel Kandasami <
kumaravel.kandas...@gmail.com> wrote:

> Hi,
>
>How to delete the terms associated with the document ?
>
> Current scenario: We are deleting documents based on a query
> ('field:value').
> The documents are getting deleted, however, the old terms associated to the
> field are displayed in the admin.
>
> How do we make SOLR to re-evaluate and update the terms associated to a
> specific fields or latest updated document ?
>
> (I am assuming we are missing some api calls .)
>
> Thank you.
>
>
> Kumar_/|\_
> www.saisk.com
> ku...@saisk.com
> "making a profound difference with knowledge and creativity..."
>


Deleting Terms:

2010-07-06 Thread Kumaravel Kandasami
Hi,

   How to delete the terms associated with the document ?

Current scenario: We are deleting documents based on a query
('field:value').
The documents are getting deleted, however, the old terms associated to the
field are displayed in the admin.

How do we make SOLR to re-evaluate and update the terms associated to a
specific fields or latest updated document ?

(I am assuming we are missing some api calls .)

Thank you.


Kumar_/|\_
www.saisk.com
ku...@saisk.com
"making a profound difference with knowledge and creativity..."


Re: document level security: indexing/searching techniques

2010-07-06 Thread Ken Krugler


On Jul 6, 2010, at 8:27am, osocurious2 wrote:



Someone else was recently asking a similar question (or maybe it was  
you but

worded differently :) ).

Putting user level security at a document level seems like a recipe  
for
pain. Solr/Lucene don't do frequent update well...and being highly  
optimized
for query, I don't blame them. Is there any way to create a series  
of roles
that you can apply to your documents? If the security level of the  
document
isn't changing, just the user access to them, give the docs a role  
in the

index, put your user/usergroup stuff in a DB or some other system and
resolve your user into valid roles, then FilterQuery on role.


You're right, baking in too fine-grained a level of security  
information is a bad idea.


As one example that worked pretty well for code search with Krugle, we  
set access control on a per project level using LDAP groups - ie each  
project had some number of groups that were granted access rights.  
Each file in the project would inherit the same list of groups.


Then, when a user logs in they get authenticated via LDAP, and we have  
the set of groups they belong to being returned by the LDAP server.  
This then becomes a fairly well-bounded list of "terms" for an OR  
query against the "acl-groups" field in each file/project document.  
Just don't forget to set the boost to 0 for that portion of the query :)


-- Ken


Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g






Relevancy and non-matching words

2010-07-06 Thread dbashford

Is there some sort of threshold that I can tweak which sets how many letters
in non-matching words makes a result more or less relevant?

Searching on title, q=fantasy football, and I get this:

{"title":"The Fantasy Football Guys",
"score":2.8387074},
{"title":"Fantasy Football Bums",
"score":2.8387074},
{"title":"Fantasy Football Xtreme",
"score":2.7019854},
{"title":"Fantasy Football Fools",
"score":2.7019634},
{"title":"Fantasy Football Brothers",
"score":2.5917912}

(I have some other scoring things in there that account for the difference
between Xtreme and Fools.)

The behavior I'm noticing is that there is some threshold for the length of
non matching words that, when tripped, kicks the score down a notch.  4 to 5
seems to trip one, 6 to 7.

I would really like something like "Bums" to score the same as "Xtreme" and
"Brothers" and let my other criterion determine which document should come
out on top.  Is there something that can be tweaked to get this to happen?

Or is my assumption a bit off base?


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Relevancy-and-non-matching-words-tp946799p946799.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: document level security: indexing/searching techniques

2010-07-06 Thread osocurious2

Someone else was recently asking a similar question (or maybe it was you but
worded differently :) ).

Putting user level security at a document level seems like a recipe for
pain. Solr/Lucene don't do frequent update well...and being highly optimized
for query, I don't blame them. Is there any way to create a series of roles
that you can apply to your documents? If the security level of the document
isn't changing, just the user access to them, give the docs a role in the
index, put your user/usergroup stuff in a DB or some other system and
resolve your user into valid roles, then FilterQuery on role.  
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/document-level-security-indexing-searching-techniques-tp946528p946649.html
Sent from the Solr - User mailing list archive at Nabble.com.


Adding new elements to index

2010-07-06 Thread Xavier Rodriguez
Hi,

I have a SOLR installed on a Tomcat application server. This solr instance
has some data indexed from a postgres database. Now I need to add some
entities from an Oracle database. When I run the full-import command, the
documents indexed are only documents from postgres. In fact, if I have 200
rows indexed from postgres and 100 rows from Oracle, the full-import process
only indexes 200 documents from oracle, although it shows clearly that the
query retruned 300 rows.

I'm not doing a delta-import, simply a full import. I've tried to clean the
index, reload the configuration, and manually remove dataimport.properties
because it's the only metadata i found.  Is there any other file to check or
modify just to get all 300 rows indexed?

Of course, I tried to find one of that oracle fields, with no results.

Thanks a lot,

Xavier Rodriguez.


Re: Wildcards queries

2010-07-06 Thread RL

Hi,

a bit more information would help to identify what's the problem in your
case.

but in general these facts come into my mind:
- leading wildcard queries are not available in solr (without extending the
QueryParser).
- no text analysing will be performed on the search word when using
wildcards so you need to make sure that the search field isn't configured to
be stemmed and you are not searching for an upper case term whereas your
text was lowercased through analysis.

hope that helps a little bit
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Wildcards-queries-tp946334p946589.html
Sent from the Solr - User mailing list archive at Nabble.com.


document level security: indexing/searching techniques

2010-07-06 Thread RL

I've a question about indexing/searching techniques in relation to document
level security.
In planning a system that has, let's say, about 1million search documents
with about 100 search fields each. Most of them unstored to keep the index
size low, because some of them can contain some kilobytes and some of them
several hundred kilobytes. Two of these search fields are for permission
checking, where i keep the explicitely allowed and explicitely disallowed
users and usergroups. (usergroups can be in a hierarchical structure with
permission inheritance)

So when a user searches in the system, his user id, and ids of usergroup
memberships are added as a filter query in my application logic before the
query is sent to solr. So far so good for the searching part.

But the problem is, that the permissions can be changed by administrators of
that system, requiring to re-index the two permission search fields.

first idea:
Partial updates of index entries is not possible, so i need to fetch all the
1million documents from a database to do a re-indexing just because some
permissions changed. The fetching process is rather expensive and requires
more then 14hours. I am sure that this can be optimized of course, but i
would rather try to avoid a whole re-indexing of all content.

second idea:
Another idea would be to store just the permissions in one small and fast to
update index and all the other stuff in the other huge and not so often
updated index. But i didn't find any possibilities to combine these two
indices in one query. Is that even possible?


Does somebody have experience with these topics or give advice how to solve
that case properly?
Thanks in advance.

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/document-level-security-indexing-searching-techniques-tp946528p946528.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Wildcards queries

2010-07-06 Thread Robert Naczinski
Hi,

thanks for the reply. I am an absolute beginner with Solr.

I have taken, for the beginning, the configuration from
{solr.home}example/solr .

In solrconfig.xml are all queryparser commented out  ;-( Where can a
find the QeryParser? Javadoc, Wiki?

Regards,

Robert

2010/7/6 Mark Miller :
> On 7/6/10 8:53 AM, Robert Naczinski wrote:
>> Hi,
>>
>> we use in our application EmbeddedSolrServer.
>
> Great!
>
>> Everything went fine.
>
> Excellent!
>
>> Now I want use wildcards queries.
>
> Cool!
>
>>
>> It does not work.
>
> Bummer!
>
>> Must be adapted for the schema.xml?
>
> Not necessarily...
>
>>
>> Can someone help me?
>
> We can try!
>
>>In wiki, I find nothing?
>
> No, you will find lots!
>
>> Why do I need simple
>> example or link.
>
> Because it would be helpful!
>
>
>>
>> Regards,
>>
>> Robert
>
>
> What query parser are you using? Dismax? That query parser does not
> support wildcards. Try the lucene queryparser if that's the case.
>
> Otherwise respond with more information about your setup.
>
> --
> - Mark
>
> http://www.lucidimagination.com
>


Re: solr with hadoop

2010-07-06 Thread Jason Rutherglen
> If you do distributed indexing correctly, what about updating the documents
> and what about replicating them correctly?

Yes, you can do you and it'll work great.

On Mon, Jul 5, 2010 at 7:42 AM, MitchK  wrote:
>
> I need to revive this discussion...
>
> If you do distributed indexing correctly, what about updating the documents
> and what about replicating them correctly?
>
> Does this work? Or wasn't this an issue?
>
> Kind regards
> - Mitch
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/solr-with-hadoop-tp482688p944413.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Wildcards queries

2010-07-06 Thread Mark Miller
On 7/6/10 8:53 AM, Robert Naczinski wrote:
> Hi,
> 
> we use in our application EmbeddedSolrServer. 

Great!

> Everything went fine.

Excellent!

> Now I want use wildcards queries.

Cool!

> 
> It does not work. 

Bummer!

> Must be adapted for the schema.xml?

Not necessarily...

> 
> Can someone help me? 

We can try!

>In wiki, I find nothing? 

No, you will find lots!

> Why do I need simple
> example or link.

Because it would be helpful!


> 
> Regards,
> 
> Robert


What query parser are you using? Dismax? That query parser does not
support wildcards. Try the lucene queryparser if that's the case.

Otherwise respond with more information about your setup.

-- 
- Mark

http://www.lucidimagination.com


Wildcards queries

2010-07-06 Thread Robert Naczinski
Hi,

we use in our application EmbeddedSolrServer. Everything went fine.
Now I want use wildcards queries.

It does not work. Must be adapted for the schema.xml?

Can someone help me? In wiki, I find nothing? Why do I need simple
example or link.

Regards,

Robert


Re: Data Import Handler Rich Format Documents

2010-07-06 Thread Tod

On 6/28/2010 8:28 AM, Alexey Serba wrote:

Ok, I'm trying to integrate the TikaEntityProcessor as suggested. �I'm using
Solr Version: 1.4.0 and getting the following error:

java.lang.ClassNotFoundException: Unable to load BinURLDataSource or
org.apache.solr.handler.dataimport.BinURLDataSource

It seems that DIH-Tika integration is not a part of Solr 1.4.0/1.4.1
release. You should use trunk / nightly builds.
https://issues.apache.org/jira/browse/SOLR-1583



Thanks, that would explain things - I'm using a stock 1.4.0 download.



My data-config.xml looks like this:


�

�

�
� �
� � �
� � �
� � �
� � �
� � �
� � �
� � �
� �

� �
� � 
� � �url="http://www.mysite.com/${my_database.content_url}";
� � �
� � 
� �

�


I added the entity name="my_database_url" section to an existing (working)
database entity to be able to have Tika index the content pointed to by the
content_url.

Is there anything obviously wrong with what I've tried so far?


I think you should move Tika entity into my_database entity and
simplify the whole configuration


...


http://www.mysite.com/${my_database.content_url}";






This, I guess, would be after I checked out and built from trunk?


Thanks - Tod


Re: Duplicate items in distributed search

2010-07-06 Thread Erik Hatcher


On Jul 4, 2010, at 5:10 PM, Andrew Clegg wrote:




Mark Miller-3 wrote:


On 7/4/10 12:49 PM, Andrew Clegg wrote:
I thought so but thanks for clarifying. Maybe a wording change on  
the

wiki


Sounds like a good idea - go ahead and make the change if you'd like.



That page seems to be marked immutable...


You have to create an account and log in in order to edit wiki pages.

Erik



Re: problem with formulating a negative query

2010-07-06 Thread Sascha Szott

Hi,

Chris Hostetter wrote:

AND, OR, and NOT are just syntactic-sugar for modifying
the MUST, MUST_NOT, and SHOULD.  The default op of "OR" only affects the
first clause of your query (R) because it doesn't have any modifiers --

Thanks for pointing that out!

-Sascha


the second clause has that NOT modifier so your query is effectivley...

topic:R -topic:[* TO *]

...which by definition can't match anything.

-Hoss