Re: Solr and DateTimes - bug?

2011-09-12 Thread Nicklas Overgaard

Hi Mauricio,

Thanks for the suggestions :) I'm already running mono 2.10.5 so i 
should be safe..


And thanks to everybody for quick answers and friendly attitude.

Best regards,

Nicklas

On 2011-09-13 03:01, Mauricio Scheffer wrote:

Hi Nicklas,
Use a nullable DateTime type instead of MinValue. It's semantically more
correct, and SolrNet will do the right mapping.
I also heard that Mono had a bug in date parsing, it didn't behave just like
.NET :
https://github.com/mausch/SolrNet/commit/f3a76ea5535633f4b301e644e25eb2dc7f0cb7ef
IIRC this bug was fixed in Mono 2.10 or so, so make sure you're running the
latest version.
Finally, there's a specific mailing list for questions about SolrNet:
http://groups.google.com/group/solrnet

Cheers,
Mauricio



On Mon, Sep 12, 2011 at 7:54 AM, Nicklas Overgaardwrote:


I see. I'm using that date to flag that my entity "has not yet ended". I
can just use another constant which Solr is capable of returning in the
correct format. The nice thing about DateTime.MinValue is that it's just
part of the .net framework :)

Hope that the issue is resolved at some point.

I'm wondering if it would be possible for you (or someone else) to fix the
issue with years from 1 to 999 being formatted incorrectly, and then
creating a new ticket for the issue with negative years?

Best regards,

Nicklas


On 2011-09-12 07:02, Chris Hostetter wrote:


: The XML output when performing a query via the solr interface is like
this:
:1-01-**01T00:00:00Z

i think you mean:1-01-01T00:00:**00Z

:>   >   So my question is: Is this a bug in the solr output engine, or
should mono
:>   >   be able to parse the date as given from solr? I have not yet tried
it out
:>   >   on .net as I do not have access to a windows machine at the moment.

it is in fact a bug in Solr that not a lot of people have been overly
concerned with some most people don't deal with dates that far back

https://issues.apache.org/**jira/browse/SOLR-1899

...I spent a little time working on it at one point but got side tracked
by other things since there are a coupld of related issues with the
canonical iso8601 date format arround year "0" that made it non obvious
what hte "ideal" solution was.

-Hoss







Re: question about Field Collapsing/ grouping

2011-09-12 Thread Jayendra Patil
The time we implemented the feature, there was no straight forward solution.

What we did is to facet on the grouped by field and counting the facets.
This would give you the distinct count for the groups.

You may also want to check the Patch @
https://issues.apache.org/jira/browse/SOLR-2242, which will return the
facet counts and you need to count it by yourself.

Regards,
Jayendra

On Tue, Sep 13, 2011 at 1:27 AM, Ahson Iqbal  wrote:
> Hi
>
> Is it possible to get number of groups that matched with specified query.
>
> like let say there are three fields in index
>
> DocumentID
> Content
> Industry
>
>
> and now i want to query as +(Content:is Content:the)
> group=true&group.field=industry
>
> now is it possible to get how many industries matched with specified query.
>
> Please help.
>
> Regards
> Ahsan
>


question about Field Collapsing/ grouping

2011-09-12 Thread Ahson Iqbal
Hi

Is it possible to get number of groups that matched with specified query.

like let say there are three fields in index

DocumentID 
Content
Industry


and now i want to query as +(Content:is Content:the)
group=true&group.field=industry

now is it possible to get how many industries matched with specified query.

Please help.

Regards
Ahsan


Re: Re; DIH Scheduling

2011-09-12 Thread Bill Bell
You can easily use cron with curl to do what you want to do.

On 9/12/11 2:47 PM, "Pulkit Singhal"  wrote:

>I don't see anywhere in:
>http://issues.apache.org/jira/browse/SOLR-2305
>any statement that shows the code's inclusion was "decided against"
>when did this happen and what is needed from the community before
>someone with the powers to do so will actually commit this?
>
>2011/6/24 Noble Paul നോബിള്‍ नोब्ळ् 
>
>> On Thu, Jun 23, 2011 at 9:13 PM, simon  wrote:
>> > The Wiki page describes a design for a scheduler, which has not been
>> > committed to Solr yet (I checked). I did see a patch the other day
>> > (see https://issues.apache.org/jira/browse/SOLR-2305) but it didn't
>> > look well tested.
>> >
>> > I think that you're basically stuck with something like cron at this
>> > time. If your application is written in java, take a look at the
>> > Quartz scheduler - http://www.quartz-scheduler.org/
>>
>> It was considered and decided against.
>> >
>> > -Simon
>> >
>>
>>
>>
>> --
>> -
>> Noble Paul
>>




Re: indexing data from rich documents - Tika with solr3.1

2011-09-12 Thread scorpking
Hi, 
Can you explain me this problem?
I have indexed data from multi file which use tika libs. And i have indexed
data from http. But only one file (ex: http://myweb/filename.pdf). Now i
have many file formats in a http path (ex:http://myweb/files/). I tried
index data from a http path but it's not work. It is my data-config. 

*


http://www.lc.unsw.edu.au/onlib/pdf/";
recursive="true" rootEntity="false" 
transformer="DateFormatTransformer"
> 








  
 


*

Error: 
Full Import
failed:org.apache.solr.handler.dataimport.DataImportHandlerException:
'baseDir' value: http://www.lc.unsw.edu.au/onlib/pdf/ is not a directory
Processing Document # 1
at
org.apache.solr.handler.dataimport.FileListEntityProcessor.init(FileListEntityProcessor.java:124)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.init(EntityProcessorWrapper.java:69)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:552)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:267)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:186)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:353)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:411)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:392)

Thanks for your help.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/indexing-data-from-rich-documents-Tika-with-solr3-1-tp3322555p3331651.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr and DateTimes - bug?

2011-09-12 Thread Mauricio Scheffer
Hi Nicklas,
Use a nullable DateTime type instead of MinValue. It's semantically more
correct, and SolrNet will do the right mapping.
I also heard that Mono had a bug in date parsing, it didn't behave just like
.NET :
https://github.com/mausch/SolrNet/commit/f3a76ea5535633f4b301e644e25eb2dc7f0cb7ef
IIRC this bug was fixed in Mono 2.10 or so, so make sure you're running the
latest version.
Finally, there's a specific mailing list for questions about SolrNet:
http://groups.google.com/group/solrnet

Cheers,
Mauricio



On Mon, Sep 12, 2011 at 7:54 AM, Nicklas Overgaard wrote:

> I see. I'm using that date to flag that my entity "has not yet ended". I
> can just use another constant which Solr is capable of returning in the
> correct format. The nice thing about DateTime.MinValue is that it's just
> part of the .net framework :)
>
> Hope that the issue is resolved at some point.
>
> I'm wondering if it would be possible for you (or someone else) to fix the
> issue with years from 1 to 999 being formatted incorrectly, and then
> creating a new ticket for the issue with negative years?
>
> Best regards,
>
> Nicklas
>
>
> On 2011-09-12 07:02, Chris Hostetter wrote:
>
>> : The XML output when performing a query via the solr interface is like
>> this:
>> :1-01-**01T00:00:00Z
>>
>> i think you mean:1-01-01T00:00:**00Z
>>
>> :>  >  So my question is: Is this a bug in the solr output engine, or
>> should mono
>> :>  >  be able to parse the date as given from solr? I have not yet tried
>> it out
>> :>  >  on .net as I do not have access to a windows machine at the moment.
>>
>> it is in fact a bug in Solr that not a lot of people have been overly
>> concerned with some most people don't deal with dates that far back
>>
>> https://issues.apache.org/**jira/browse/SOLR-1899
>>
>> ...I spent a little time working on it at one point but got side tracked
>> by other things since there are a coupld of related issues with the
>> canonical iso8601 date format arround year "0" that made it non obvious
>> what hte "ideal" solution was.
>>
>> -Hoss
>>
>
>


RE: Weird behaviors with not operators.

2011-09-12 Thread Patrick Sauts
I mean it's a known bug.

Hostetter  AND (-chris *:*) 

Should do the trick.
Depending on your request.

NAME:(-chris *:*)

-Original Message-
From: Patrick Sauts [mailto:patrick.via...@gmail.com] 
Sent: Monday, September 12, 2011 3:57 PM
To: solr-user@lucene.apache.org
Subject: RE: Weird behaviors with not operators.

Maybe this will answer your question
http://wiki.apache.org/solr/FAQ

Why does 'foo AND -baz' match docs, but 'foo AND (-bar)' doesn't ?

Boolean queries must have at least one "positive" expression (ie; MUST or
SHOULD) in order to match. Solr tries to help with this, and if asked to
execute a BooleanQuery that does contains only negatived clauses _at the
topmost level_, it adds a match all docs query (ie: *:*)

If the top level BoolenQuery contains somewhere inside of it a nested
BooleanQuery which contains only negated clauses, that nested query will not
be modified, and it (by definition) an't match any documents -- if it is
required, that means the outer query will not match.

More Detail:

*  https://issues.apache.org/jira/browse/SOLR-80
*
https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201006.mbox/%3Cal
pine.deb.1.10.1006011609080.29...@radix.cryptio.net%3E


Patrick.
-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org]
Sent: Monday, September 12, 2011 3:04 PM
To: solr-user@lucene.apache.org
Subject: Re: Weird behaviors with not operators.


: I'm crashing into a weird behavior with - operators.

I went ahead and added a FAQ on this using some text from a previous nearly
identical email ...

https://wiki.apache.org/solr/FAQ#Why_does_.27foo_AND_-baz.27_match_docs.2C_b
ut_.27foo_AND_.28-bar.29.27_doesn.27t_.3F

please reply if you have followup questions.


-Hoss




RE: Weird behaviors with not operators.

2011-09-12 Thread Patrick Sauts
Maybe this will answer your question
http://wiki.apache.org/solr/FAQ

Why does 'foo AND -baz' match docs, but 'foo AND (-bar)' doesn't ?

Boolean queries must have at least one "positive" expression (ie; MUST or
SHOULD) in order to match. Solr tries to help with this, and if asked to
execute a BooleanQuery that does contains only negatived clauses _at the
topmost level_, it adds a match all docs query (ie: *:*)

If the top level BoolenQuery contains somewhere inside of it a nested
BooleanQuery which contains only negated clauses, that nested query will not
be modified, and it (by definition) an't match any documents -- if it is
required, that means the outer query will not match.

More Detail:

*  https://issues.apache.org/jira/browse/SOLR-80
*
https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201006.mbox/%3Cal
pine.deb.1.10.1006011609080.29...@radix.cryptio.net%3E


Patrick.
-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: Monday, September 12, 2011 3:04 PM
To: solr-user@lucene.apache.org
Subject: Re: Weird behaviors with not operators.


: I'm crashing into a weird behavior with - operators.

I went ahead and added a FAQ on this using some text from a previous nearly
identical email ...

https://wiki.apache.org/solr/FAQ#Why_does_.27foo_AND_-baz.27_match_docs.2C_b
ut_.27foo_AND_.28-bar.29.27_doesn.27t_.3F

please reply if you have followup questions.


-Hoss



Re: Weird behaviors with not operators.

2011-09-12 Thread Chris Hostetter

: I'm crashing into a weird behavior with - operators.

I went ahead and added a FAQ on this using some text from a previous 
nearly identical email ...

https://wiki.apache.org/solr/FAQ#Why_does_.27foo_AND_-baz.27_match_docs.2C_but_.27foo_AND_.28-bar.29.27_doesn.27t_.3F

please reply if you have followup questions.


-Hoss


How to return a function result instead of doclist in the Solr collapsing/grouping feature?

2011-09-12 Thread Pablo Ricco
I have the following solr fields in schema.xml:

   - id (string)
   - name (string)
   - category(string)
   - latitude (double)
   - longitude(double)

Is it possible to make a query that groups by category and returns the
average of latitude and longitude instead of the doclist?

Thanks,
Pablo


Re: Solr: Return field names that contain search term

2011-09-12 Thread Rahul Warawdekar
Thanks Chris !

Will try out the second approach you suggested and share my findings.

On Mon, Sep 12, 2011 at 5:03 PM, Chris Hostetter
wrote:

>
> : > Would highly appreciate if someone can suggest other efficient ways to
> : > address this kind of a requirement.
>
> one approach would be to index each attachment as it's own document and
> search those.  you could then use things like the group collapsing
> features to return onlly the "main" type documents when multiple
> attachments match.
>
> similarly: you could still index each "main" document with a giant
> text field containing all of the attachment text, *and* you could indx
> each attachment as it's own document.  You would search on the main docs
> as you do now, but then your app could issue a secondary request searching
> for all  "attachment" docs that match on one of the main docIds in a
> special field, and use the results to note which attachment of each doc
> (if any) caused the match.
>
> -Hoss
>



-- 
Thanks and Regards
Rahul A. Warawdekar


Re: Solr: Return field names that contain search term

2011-09-12 Thread Chris Hostetter

: > Would highly appreciate if someone can suggest other efficient ways to
: > address this kind of a requirement.

one approach would be to index each attachment as it's own document and 
search those.  you could then use things like the group collapsing 
features to return onlly the "main" type documents when multiple 
attachments match.

similarly: you could still index each "main" document with a giant 
text field containing all of the attachment text, *and* you could indx 
each attachment as it's own document.  You would search on the main docs 
as you do now, but then your app could issue a secondary request searching 
for all  "attachment" docs that match on one of the main docIds in a 
special field, and use the results to note which attachment of each doc 
(if any) caused the match.

-Hoss


Re: Re; DIH Scheduling

2011-09-12 Thread Pulkit Singhal
I don't see anywhere in:
http://issues.apache.org/jira/browse/SOLR-2305
any statement that shows the code's inclusion was "decided against"
when did this happen and what is needed from the community before
someone with the powers to do so will actually commit this?

2011/6/24 Noble Paul നോബിള്‍ नोब्ळ् 

> On Thu, Jun 23, 2011 at 9:13 PM, simon  wrote:
> > The Wiki page describes a design for a scheduler, which has not been
> > committed to Solr yet (I checked). I did see a patch the other day
> > (see https://issues.apache.org/jira/browse/SOLR-2305) but it didn't
> > look well tested.
> >
> > I think that you're basically stuck with something like cron at this
> > time. If your application is written in java, take a look at the
> > Quartz scheduler - http://www.quartz-scheduler.org/
>
> It was considered and decided against.
> >
> > -Simon
> >
>
>
>
> --
> -
> Noble Paul
>


Re: pagination with grouping

2011-09-12 Thread alxsss
Is case #2 planned to be coded in the future releases?

Thanks.
Alex.

 

 


 

 

-Original Message-
From: Bill Bell 
To: solr-user 
Sent: Thu, Sep 8, 2011 10:17 pm
Subject: Re: pagination with grouping


There are 2 use cases:

1. rows=10 means 10 groups.
2. rows=10 means to results (irregardless of groups).

I thought there was a total number of groups (ngroups) or case #1.

I don't believe case #2 has been coded.

On 9/8/11 2:22 PM, "alx...@aim.com"  wrote:

>
> 
>
> Hello,
>
>When trying to implement pagination as in the case without grouping I see
>two issues.
>1. with rows=10 solr feed displays 10 groups not 10 results
>2. there is no total number of results with grouping  to show the last
>page.
>
>In detail:
>1. I need to display only 10 results in one page. For example if I have
>group.limit=5 and the first group has 5 docs, the second 3 and the third
>2 then only these 3 group must be displayed in the first page.
>Currently specifying rows=10, shows 10 groups and if we have 5 docs in
>each group then in the first page we will have 50 docs.
>
>2.I need to show the last page, for which I need total number of results
>with grouping. For example if I have 5 groups with number of docs 5, 4,
>3,2 1 then this total number must be 15.
>
>Any ideas how to achieve this.
>
>Thanks in advance.
>Alex.
>
>
>



 


[Commercial training announcement] Lucene training at Lucene EuroCon, Barcelona - Oct. 17,18, 2011

2011-09-12 Thread Erik Hatcher
http://www.lucidimagination.com/blog/2011/09/12/learn-lucene/ - pasted below too

Hi everyone... I'm not usually much on advertising/hyping events where I speak 
and teach, but I'm really interested in drumming up a solid attendance for our 
Lucene training that I'll be teaching at Lucene EuroCon in Barcelona next 
month.  We always fill up the Solr trainings, but we all know that Lucene is 
the heart of Solr and I'm happy to be immersing myself once again at the Lucene 
layer to teach this class.

I'm looking forward to seeing some of you next month at our very exciting 
EuroCon event! - http://2011.lucene-eurocon.org/pages/training#lucene-workshop

Erik



You’re using Solr, or some other Lucene-based search solutions, … or you should 
and will be!  You are (or will be) building your solutions on top of a 
top-notch search library, Apache Lucene.

Solr makes using Lucene easier – you can index a variety of data sources 
easily, pretty much out of the box, and you can easily integrate features such 
as faceting, highlighting, and spellchecking – all without writing Java code. 
And if that’s all you need and it works solidly for you, awesome! You can stop 
reading now and attend one of our other excellent training courses that fit 
your needs. But if you are a tinkerer and want to know what makes Solr shine, 
or if you need some new or improved feature read on…

Deeper down, Lucene is cranking – analyzing, buffering, and indexing your 
documents, merging segments, parsing queries, caching data structures, rapidly 
hopping around an inverted index, computing scores, navigating finite state 
machines, and much more.

So how do you go about learning Lucene deeper? I’d be remiss not to mention 
Lucene in Action, as it’s the most polished and well crafted documentation 
available on the Lucene library. And of course there’s the incredibly vibrant 
and helpful Lucene open source community. Those resources will serve you well, 
but there’s no substitute for live, interactive, personal training to get you 
up to speed fast with best practices.

I’m in the process of overhauling our Lucene training course, that I’ll 
personally be delivering at Lucene EuroCon 2011 in Barcelona next month. This 
new and improved course takes an activity-based approach to learning and using 
Lucene’s API, beginning with the common tasks in building solutions using 
Lucene, whether you’re building directly to Lucene’s API or you’re writing 
custom components for Solr.

One area that I’m particularly jazzed about teaching is “query parsing”, the 
process of taking a user (or machine’s) search request and turning it into the 
appropriate underlying Lucene Query object instance.  Many folks developing 
with Lucene are familiar with Lucene’s QueryParser.  But did you know there are 
a couple of other query parsers with special powers?  There’s the surround 
query parser, enabling sophisticated proximity SpanQuery clauses.  And there’s 
the mysterious “XML query parser” (don’t let the ugly sounding name dissuade 
you) that slots dynamic query parameters, such as coming from an “advanced 
search” request, into a tree structured query template.   There’s some more 
insight into the world of Lucene query parsers an “Exploring Query Parsers” 
blog post.

What about all the Lucene contrib modules activity in the Lucene 3.x releases?  
 Here’s a bit of the goodnesses: better Unicode handling with the ICU 
tokenizers and filters, improved stemming, and many other analysis 
improvements, field grouping/collapsing, and block join/query for handling 
particular parent/child relationships.

Come learn the latest about the amazing Lucene library at Lucene EuroCon!  You, 
your boss, and your projects will all be glad you did.

Re: Parameter not working for master/slave

2011-09-12 Thread Pulkit Singhal
Hello Bill,

I can't really answer your question about replicaiton being supported on
Solr3.3 (I use trunk 4.x myself) BUT I can tell you that if each Solr node
has just one core ... only then does it make sense to use
-Denable.master=true and -Denable.slave=true ... otherwise, as Yury points
out, you should use solr.xml to pass in the value for each core
individually.

What is a node you ask? To me it means one App Server (Jetty) running Solr
... doesn't matter if its multiple ones on the same machine or single ones
on different machines. That's what I mean by a node here.

2011/9/12 Yury Kats 

> On 9/11/2011 11:24 PM, William Bell wrote:
> > I am using 3.3 SOLR. I tried passing in -Denable.master=true and
> > -Denable.slave=true on the Slave machine.
> > Then I changed solrconfig.xml to reference each as per:
> >
> >
> http://wiki.apache.org/solr/SolrReplication#enable.2BAC8-disable_master.2BAC8-slave_in_a_node
>
> These are core parameters, you need to set them in solr.xml per core.
>


How to combine RSS w/ Tika when using Data Import Handler (DIH)

2011-09-12 Thread Pulkit Singhal
Given an RSS raw feed source link such as the following:
http://persistent.info/cgi-bin/feed-proxy?url=http%3A%2F%2Fwww.amazon.com%2Frss%2Ftag%2Fblu-ray%2Fnew%2Fref%3Dtag_rsh_hl_ersn

I can easily get to the value of the description for an item like so:


But the content of "description" happens to be in HTML and sadly it is this
HTML chunk that has some pretty decent information that I would like to
import as well.
1) For example it has the image for the item:
http://ecx.images-amazon.com/images/I/51yyAAoYzKL._SL160_SS160_.jpg"; ... />
2) It has the price for the item:
$13.99
And many other useful pieces of data that aren't in a proper rss format but
they are simply thrown together inside the html chunk that is served as the
value for the xpath="/rss/item/description"

So, how can I configure DIH to start importing this html information as
well?
Is Tika the way to go?
Can someone give a brief example of what a config file with both Tika config
and RSS config would/should look like?

Thanks!
- Pulkit


Re: Running solr on small amounts of RAM

2011-09-12 Thread Chris Hostetter

Beyond the suggestions already made, i would add:

a) being really aggressive about stop words can help keep the index size 
down, which can help reduce the amount of memory needed to scan the term 
lists

b) faceting w/o any caching is likelye going to be too slow to be 
acceptible.

c) don't sort on anything except score.

-Hoss


RE: select query does not find indexed pdf document

2011-09-12 Thread Bob Sandiford
Hi, Michael.

Well, the stock answer is, 'it depends'

For example - would you want to be able to search filename without searching 
file contents, or would you always search both of them together?  If both, then 
copy both the file name and the parsed file content from the pdf into a single 
search field, and you can set that up as the default search field.

Or - what kind of processing / normalizing do you want on this data?  Case 
insensitive?  Accent insensitive?  If a 'word' contains camel case (e.g. 
TheVeryIdea), do you want that split on the case changes?  (but then watch out 
for things like "iPad")  If a 'word' contains numbers, do want them left 
together, or separated?  Do you want stemming (where searching for 'stemming' 
would also find 'stem', 'stemmed', that sort of thing?)  Is this always 
English, or are the other languages involved.  Do you want the text processing 
to be the same for indexing vs searching?  Do you want to be able to find hits 
based on the first few characters of a term?  (ngrams)

Do you want to be able to highlight text segments where the search terms were 
found?

probably you want to read up on the various tokenizers and filters that are 
available.  Do some prototyping and see how it looks.

Here's a starting point: 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

Basically, there is no 'one size fits all' here.  Part of the power of Solr / 
Lucene is its configurability to achieve the results your business case calls 
for.  Part of the drawback of Solr / Lucene - especially for new folks - is its 
configurability to achieve the results you business case calls for. :)

Anyone got anything else to suggest for Michael?

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com

From: Michael Dockery [mailto:dockeryjava...@yahoo.com]
Sent: Monday, September 12, 2011 1:18 PM
To: Bob Sandiford
Subject: Re: select query does not find indexed pdf document

thank you.  that worked.

Any tips for   very   very  basic setup of the schema xml?
   or is the default basic enough?

I basically only want to search search on
filename   andfile contents


From: Bob Sandiford 
To: "solr-user@lucene.apache.org" ; Michael 
Dockery 
Sent: Monday, September 12, 2011 10:04 AM
Subject: RE: select query does not find indexed pdf document

Um - looks like you specified your id value as "pdfy", which is reflected in 
the results from the "*:*" query, but your id query is searching for "vpn", 
hence no matches...

What does this query yield?

http://www/SearchApp/select/?q=id:pdfy

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | 
bob.sandif...@sirsidynix.com
www.sirsidynix.com

> -Original Message-
> From: Michael Dockery 
> [mailto:dockeryjava...@yahoo.com]
> Sent: Monday, September 12, 2011 9:56 AM
> To: solr-user@lucene.apache.org
> Subject: Re: select query does not find indexed pdf document
>
> http://www/SearchApp/select/?q=id:vpn
>
> yeilds this:
>   
> - 
> - 
>   0
>   15
> - 
>   id:vpn
>   
>   
>   
>   
>
>
> *
>
>  http://www/SearchApp/select/?q=*:*
>
> yeilds this:
>
>   
> - 
> - 
>   0
>   16
> - 
>   *.*
>   
>   
> - 
> - 
>   doc
> - 
>   application/pdf
>   
>   pdfy
>   2011-05-20T02:08:48Z
> - 
>   dmvpndeploy.pdf
>   
>   
>   
>   
>
>
> From: Jan Høydahl mailto:jan@cominvent.com>>
> To: solr-user@lucene.apache.org; Michael 
> Dockery
> mailto:dockeryjava...@yahoo.com>>
> Sent: Monday, September 12, 2011 4:59 AM
> Subject: Re: select query does not find indexed pdf document
>
> Hi,
>
> What do you get from a query http://www/SearchApp/select/?q=*:* or
> http://www/SearchApp/select/?q=id:vpn ?
> You may not have mapped the fields correctly to your schema?
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
>
> On 12. sep. 2011, at 02:12, Michael Dockery wrote:
>
> > I am new to solr.
> >
> > I tried to upload a pdf file via curl to my solr webapp (on tomcat)
> >
> > curl
> "http://www/SearchApp/update/extract?stream.file=c:\dmvpn.pdf&stream.co
> ntentType=application/pdf&literal.id=pdfy&commit=true"
> >
> >
> >
> > 
> > 
> > 0 name="QTime">860
> > 
> >
> >
> > but
> >
> > http://www/SearchApp/select/?q=vpn
> >
> >
> > does not find the document
> >
> >
> > 
> > 
> > 0
> > 0
> > 
> > vpn
> > 
> > 
> > 
> > 
> >
> >
> > help is appreciated.
> >
> > =
> > fyi
> > I point my test webapp to the index/solr home via mod meta-
> data/context.xml
> > 
> > >  value="c:/solr_home" override="true" />
> >
> > and I had to copy all these jars to my webapp lib dir: (to avoid the
> classnotfound)
> > Solr_download\contrib\extr

Re: Parameter not working for master/slave

2011-09-12 Thread Erik Hatcher

On Sep 11, 2011, at 23:24 , William Bell wrote:

> I am using 3.3 SOLR. I tried passing in -Denable.master=true and
> -Denable.slave=true on the Slave machine.
> Then I changed solrconfig.xml to reference each as per:
> 
> http://wiki.apache.org/solr/SolrReplication#enable.2BAC8-disable_master.2BAC8-slave_in_a_node
> 
> But this is not working. The enable parameter does not appear to work in 3.3.
> 
> If this supposed to be working? What else can I do to debug it? How
> can I see other parameters working in solrconfig.xml ?

Bill -

To test a system parameter being passed in, you can try this trick - edit the 
/debug/dump (or any request handler you fancy really) like so:

  

 ${solr.test_param:DEFAULT}
 explicit 
 true

  

I launched Jetty like this: java -Dsolr.test_param=MYTOM_VALUE -jar start.jar 

And http://localhost:8983/solr/debug/dump?wt=json&indent=on yields this:

{
  "responseHeader":{
"status":0,
"QTime":2,
"handler":"org.apache.solr.handler.DumpRequestHandler",
"params":{
  "indent":"on",
  "wt":"json"}},
  "params":{
"echoParams":"explicit",
"test_param":"MY_CUSTOM_VALUE",
"echoHandler":"true",
"indent":"on",
"wt":"json"},
  "context":{
"webapp":"/solr",
"path":"/debug/dump"}}

Erik



Re: Parameter not working for master/slave

2011-09-12 Thread Yury Kats
On 9/11/2011 11:24 PM, William Bell wrote:
> I am using 3.3 SOLR. I tried passing in -Denable.master=true and
> -Denable.slave=true on the Slave machine.
> Then I changed solrconfig.xml to reference each as per:
> 
> http://wiki.apache.org/solr/SolrReplication#enable.2BAC8-disable_master.2BAC8-slave_in_a_node

These are core parameters, you need to set them in solr.xml per core.


Re: Solr: Return field names that contain search term

2011-09-12 Thread darren

I also would like to know the answer to this. But my feeling is
that you can't do what you want. I also had to use the highlighting
workaround and aggregate dynamic field to accomplish the inability
of multivalued fields to accommodate it.

On Mon, 12 Sep 2011 11:44:01 -0400, Rahul Warawdekar
 wrote:
> Hi,
> 
> I have a a query on Solr search as follows.
> 
> I am indexing an entity which includes a multivalued field using DIH.
> This multivalued field contains content from multiple attachments for
> a single entity.
> 
> Now, for eg. if i search for the term "solr", will I be able to know
> which field contains this search term ?
> And if it is a multivaued field, which field number in that
> multivalued field contains the search term ?
> 
> Currently, to achieve this, I am using a workaround using the
> highlighting feature.
> I am indexing all the multiple attachments within a single entity and
> document as dynamic fields "_i".
> 
> While searching, I am highlighting on these dynamic fields (hl.fl=*_i)
> and from the highlighitng section in the results, I am able to get the
> attachment number which contains the search term.
> But since this approach involves highlighting large attachments, the
> search response times are very slow.
> 
> Would highly appreciate if someone can suggest other efficient ways to
> address this kind of a requirement.


Solr: Return field names that contain search term

2011-09-12 Thread Rahul Warawdekar
Hi,

I have a a query on Solr search as follows.

I am indexing an entity which includes a multivalued field using DIH.
This multivalued field contains content from multiple attachments for
a single entity.

Now, for eg. if i search for the term "solr", will I be able to know
which field contains this search term ?
And if it is a multivaued field, which field number in that
multivalued field contains the search term ?

Currently, to achieve this, I am using a workaround using the
highlighting feature.
I am indexing all the multiple attachments within a single entity and
document as dynamic fields "_i".

While searching, I am highlighting on these dynamic fields (hl.fl=*_i)
and from the highlighitng section in the results, I am able to get the
attachment number which contains the search term.
But since this approach involves highlighting large attachments, the
search response times are very slow.

Would highly appreciate if someone can suggest other efficient ways to
address this kind of a requirement.

-- 
Thanks and Regards
Rahul A. Warawdekar


London Open Source Search Social - Tuesday 18th October

2011-09-12 Thread Richard Marr
Hi all,

That's right, hold on to your hats, we're holding another London Search
Social on the 18th Oct.
http://www.meetup.com/london-search-social/events/33218292/

Venue is still TBD, but highly likely to be a quiet(ish) central London pub.

There's usually a healthy mix of experience and backgrounds, and pet topics
or show-n-tell projects are welcome.

Save the date, it'd be great to see you there.


-- 
Richard Marr


Re: Stemming and other tokenizers

2011-09-12 Thread Jan Høydahl
Hi,

Do they? Can you explain the layout of the documents? 

There are two ways to handle multi lingual docs. If all your docs have both an 
English and a Norwegian version, you may either split these into two separate 
documents, each with the "language" field filled by LangId - which then also 
lets you filter by language. Or you may assign a title_en and title_no to the 
same document (expand with more fields if you have more languages per 
document), and keep it as one document. Your client will then be adapted to 
search the language(s) that the user wants.

If one document has multiple languages within the same field, e.g. "body", say 
one paragraph of English and the next is Norwegian, then we currently do not 
have any capability in Solr to apply different analysis (tokenization, stemming 
etc) to each paragraph.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 12. sep. 2011, at 11:37, Manish Bafna wrote:

> What is single document has multiple languages?
> 
> On Mon, Sep 12, 2011 at 2:23 PM, Jan Høydahl  wrote:
> 
>> Hi
>> 
>> Everybody else use dedicated field per language, so why can't you?
>> Please explain your use case, and perhaps we can better help understand
>> what you're trying to do.
>> Do you always know the query language in advance?
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> Solr Training - www.solrtraining.com
>> 
>> On 12. sep. 2011, at 08:28, Patrick Sauts wrote:
>> 
>>> I can't create one field per language, that is the problem but I'll dig
>> into
>>> it following your indications.
>>> I let you know what I could come out with.
>>> 
>>> Patrick.
>>> 
>>> 2011/9/11 Jan Høydahl 
>>> 
 Hi,
 
 You'll not be able to detect language and change stemmer on the same
>> field
 in one go. You need to create one fieldType in your schema per language
>> you
 want to use, and then use LanguageIdentification (SOLR-1979) to do the
>> magic
 of detecting language and renaming the field. If you set
 langid.override=false, languid.map=true and populate your "language"
>> field
 with the known language, you will probably get the desired effect.
 
 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Solr Training - www.solrtraining.com
 
 On 10. sep. 2011, at 03:24, Patrick Sauts wrote:
 
> Hello,
> 
> 
> 
> I want to implement some king of AutoStemming that will detect the
 language
> of a field based on a tag at the start of this field like #en# my field
 is
> stored on disc but I don't want this tag to be stored. Is there a way
>> to
> avoid this field to be stored ?
> 
> To me all the filters and the tokenizers interact only with the indexed
> field and not the stored one.
> 
> Am I wrong ?
> 
> Is it possible to you to do such a filter.
> 
> 
> 
> Patrick.
> 
 
 
>> 
>> 



Re: FastVectorHighlighter with wildcard queries

2011-09-12 Thread Rahul Warawdekar
Hi Koji,

Thanks for the information !
I will try the patches provided by you.

On 9/8/11, Koji Sekiguchi  wrote:
> (11/09/09 6:16), Rahul Warawdekar wrote:
>> Hi,
>>
>> I am currently evaluating the FastVectorHighlighter in a Solr search based
>> project and have a couple of questions
>>
>> 1. Is there any specific reason why the FastVectorHighlighter does not
>> provide support for multiterm(wildcard) queries ?
>> 2. What are the other constraints when using FastVectorHighlighter ?
>>
>
> FVH used to have typical constrains:
>
> 1. supports only TermQuery and PhraseQuery (and
> BooleanQuery/DisjunctionMaxQuery that
> include TQ and PQ)
> 2. ignores word boundary
>
> But now for 1, FVH will support other queries:
>
> https://issues.apache.org/jira/browse/LUCENE-1889
>
> I believe it is almost closed to be fixed. For 2, FVH in the latest
> trunk/3x, pays
> regard to word or sentence boundary through BoundaryScanner:
>
> https://issues.apache.org/jira/browse/LUCENE-1824
>
> koji
> --
> Check out "Query Log Visualizer" for Apache Solr
> http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html
> http://www.rondhuit.com/en/
>


-- 
Thanks and Regards
Rahul A. Warawdekar


RE: How to serach on specific file types ?

2011-09-12 Thread Jaeger, Jay - DOT
Some possibilities:

1) Put the file extension into your index (that is what we did when we were 
testing indexing documents with Solr)
2) Put a mime type for the document into your index.
3) Put the whole file name / URL into your index, and match on part of the 
name.  This will give some false positives.

JRJ

-Original Message-
From: ahmad ajiloo [mailto:ahmad.aji...@gmail.com] 
Sent: Monday, September 12, 2011 5:58 AM
To: solr-user@lucene.apache.org
Subject: Fwd: How to serach on specific file types ?

Hello
I want to search on articles. So need to find only specific files like doc,
docx, and pdf.
I don't need any html pages. Thus the result of our search should only
consists of doc, docx, and pdf files.
can you help me?


Re: MMapDirectory failed to map a 23G compound index segment

2011-09-12 Thread Rich Cariens
Thanks. It's definitely repeatable and I may spend some time plumbing this
further. I'll let the list know if I find anything.

The problem went away once I optimized the index down to a single segment
using a simple IndexWriter driver. This was a bit strange since the
resulting index contained similarly large (> 23G) files. The JVM didn't seem
to have any trouble MMap'ing those.

No, I don't need (or necessarily want) to use compound index file formats.
That was actually a goof on my part which I've since corrected :).

On Fri, Sep 9, 2011 at 9:42 PM, Lance Norskog  wrote:

> I remember now: by memory-mapping one block of address space that big, the
> garbage collector has problems working around it. If the OOM is repeatable,
> you could try watching the app with jconsole and watch the memory spaces.
>
> Lance
>
> On Thu, Sep 8, 2011 at 8:58 PM, Lance Norskog  wrote:
>
> > Do you need to use the compound format?
> >
> > On Thu, Sep 8, 2011 at 3:57 PM, Rich Cariens  >wrote:
> >
> >> I should add some more context:
> >>
> >>   1. the problem index included several cfs segment files that were
> around
> >>   4.7G, and
> >>   2. I'm running four SOLR instances on the same box, all of which have
> >>   similiar problem indeces.
> >>
> >> A colleague thought perhaps I was bumping up against my 256,000 open
> files
> >> ulimit. Do the MultiMMapIndexInput ByteBuffer arrays each consume a file
> >> handle/descriptor?
> >>
> >> On Thu, Sep 8, 2011 at 5:19 PM, Rich Cariens 
> >> wrote:
> >>
> >> > FWiW I optimized the index down to a single segment and now I have no
> >> > trouble opening an MMapDirectory on that index, even though the 23G
> cfx
> >> > segment file remains.
> >> >
> >> >
> >> > On Thu, Sep 8, 2011 at 4:27 PM, Rich Cariens  >> >wrote:
> >> >
> >> >> Thanks for the response. "free -g" reports:
> >> >>
> >> >> totalusedfreesharedbuffers
> >> >> cached
> >> >> Mem:  141  95  46 0
> >> >> 093
> >> >> -/+ buffers/cache:  2 139
> >> >> Swap:   3   0   3
> >> >>
> >> >> 2011/9/7 François Schiettecatte 
> >> >>
> >> >>> My memory of this is a little rusty but isn't mmap also limited by
> mem
> >> +
> >> >>> swap on the box? What does 'free -g' report?
> >> >>>
> >> >>> François
> >> >>>
> >> >>> On Sep 7, 2011, at 12:25 PM, Rich Cariens wrote:
> >> >>>
> >> >>> > Ahoy ahoy!
> >> >>> >
> >> >>> > I've run into the dreaded OOM error with MMapDirectory on a 23G
> cfs
> >> >>> compound
> >> >>> > index segment file. The stack trace looks pretty much like every
> >> other
> >> >>> trace
> >> >>> > I've found when searching for OOM & "map failed"[1]. My
> >> configuration
> >> >>> > follows:
> >> >>> >
> >> >>> > Solr 1.4.1/Lucene 2.9.3 (plus
> >> >>> > SOLR-1969
> >> >>> > )
> >> >>> > CentOS 4.9 (Final)
> >> >>> > Linux 2.6.9-100.ELsmp x86_64 yada yada yada
> >> >>> > Java SE (build 1.6.0_21-b06)
> >> >>> > Hotspot 64-bit Server VM (build 17.0-b16, mixed mode)
> >> >>> > ulimits:
> >> >>> >core file size (blocks, -c) 0
> >> >>> >data seg size(kbytes, -d) unlimited
> >> >>> >file size (blocks, -f) unlimited
> >> >>> >pending signals(-i) 1024
> >> >>> >max locked memory (kbytes, -l) 32
> >> >>> >max memory size (kbytes, -m) unlimited
> >> >>> >open files(-n) 256000
> >> >>> >pipe size (512 bytes, -p) 8
> >> >>> >POSIX message queues (bytes, -q) 819200
> >> >>> >stack size(kbytes, -s) 10240
> >> >>> >cpu time(seconds, -t) unlimited
> >> >>> >max user processes (-u) 1064959
> >> >>> >virtual memory(kbytes, -v) unlimited
> >> >>> >file locks(-x) unlimited
> >> >>> >
> >> >>> > Any suggestions?
> >> >>> >
> >> >>> > Thanks in advance,
> >> >>> > Rich
> >> >>> >
> >> >>> > [1]
> >> >>> > ...
> >> >>> > java.io.IOException: Map failed
> >> >>> > at sun.nio.ch.FileChannelImpl.map(Unknown Source)
> >> >>> > at
> >> org.apache.lucene.store.MMapDirectory$MMapIndexInput.(Unknown
> >> >>> > Source)
> >> >>> > at
> >> org.apache.lucene.store.MMapDirectory$MMapIndexInput.(Unknown
> >> >>> > Source)
> >> >>> > at org.apache.lucene.store.MMapDirectory.openInput(Unknown Source)
> >> >>> > at
> org.apache.lucene.index.SegmentReader$CoreReaders.(Unknown
> >> >>> Source)
> >> >>> >
> >> >>> > at org.apache.lucene.index.SegmentReader.get(Unknown Source)
> >> >>> > at org.apache.lucene.index.SegmentReader.get(Unknown Source)
> >> >>> > at org.apache.lucene.index.DirectoryReader.(Unknown Source)
> >> >>> > at org.apache.lucene.index.ReadOnlyDirectoryReader.(Unknown
> >> >>> Source)
> >> >>> > at org.apache.lucene.index.DirectoryReader$1.doBody(Unknown
> Source)
> >> >>> > at
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(Unknown
> >> >>> > Source)
> >> >>> > at org.apache.lucene.index.DirectoryRea

RE: Master Slave Question

2011-09-12 Thread Jaeger, Jay - DOT
You could prevent queries to the master by limiting what IP addresses are 
allowed to communicate with it, or by modifying web.xml to put different 
security on /update vs. /select .

We took a simplistic approach.  We did some load testing, and discovered that 
we could handle our expected update load and our query load on the master.  We 
use replication just for failover in case the master dies (in which case 
updates would be held up until the master was fixed).  (We also have security 
on both /update and /select -- just different security).

JRJ

-Original Message-
From: Patrick Sauts [mailto:patrick.via...@gmail.com] 
Sent: Saturday, September 10, 2011 11:26 AM
To: solr-user@lucene.apache.org
Subject: Re: Master Slave Question

Real Time indexing (solr 4) or decrease replication poll and auto commit
time.

2011/9/10 Jamie Johnson 

> Is it appropriate to query the master servers when replicating?  I ask
> because there could be a case where we index say 50 documents to the
> master, they have not yet been replicated and a user asks for page 2,
> when they ask for page 2 the request could be sent to a slave and get
> 0.  Is there a way to avoid this?  My thought was to not allow
> querying of the master but I'm not sure that this could be configured
> in solr
>


RE: question about StandardAnalyzer, differences between solr 1.4 and solr 3.3

2011-09-12 Thread Jaeger, Jay - DOT
Looking at the Wiki  ( 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters ), it looks like 
the solr.StandardTokenizerFactory changed with Solr 3.1 .

We use solr.KeyWordTokenizerFactory for our middle names (and then also throw 
in solr.LowerCaseFilterFactory to normalize to lower case).  It treats the 
entire field as a single token, and in general doesn't "futz" with what came in.

You might try the analyzer panel on the admin web page to see what exactly is 
happening during indexing and analysis.

JRJ

-Original Message-
From: Marc Des Garets [mailto:marc.desgar...@192.com] 
Sent: Friday, September 09, 2011 5:21 AM
To: solr-user@lucene.apache.org
Subject: question about StandardAnalyzer, differences between solr 1.4 and solr 
3.3

Hi,

I have a simple field defined like this:

  


Which I use here:
   

In solr 1.4, I could do:
?q=(middlename:a*)

And I was getting all documents where middlename = A or where middlename starts 
by the letter A.

In solr 3.3, I get only results where middlename starts by the letter A but not 
where middlename is equal to A.

The thing is this happens only with the letter A, with other letters, it is 
fine, I get the ones starting by the letter and the ones equal to the letter. 
My guess is that it considers A as the English article but I do not specify any 
filter with stopwords so how come the behaviour with the letter A is different 
from the other letters? Is there a bug? How can I change my field to work with 
the letter A, the same way it does with other letters.


Thanks,
Marc
--
This transmission is strictly confidential, possibly legally privileged, and 
intended solely for the 
addressee.  Any views or opinions expressed within it are those of the author 
and do not necessarily 
represent those of 192.com, i-CD Publishing (UK) Ltd or any of it's subsidiary 
companies.  If you 
are not the intended recipient then you must not disclose, copy or take any 
action in reliance of this 
transmission. If you have received this transmission in error, please notify 
the sender as soon as 
possible.  No employee or agent is authorised to conclude any binding agreement 
on behalf of 
i-CD Publishing (UK) Ltd with another party by email without express written 
confirmation by an 
authorised employee of the Company. http://www.192.com (Tel: 08000 192 192).  
i-CD Publishing (UK) Ltd 
is incorporated in England and Wales, company number 3148549, VAT No. GB 
673128728.


RE: select query does not find indexed pdf document

2011-09-12 Thread Bob Sandiford
Um - looks like you specified your id value as "pdfy", which is reflected in 
the results from the "*:*" query, but your id query is searching for "vpn", 
hence no matches...

What does this query yield?

http://www/SearchApp/select/?q=id:pdfy

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com

> -Original Message-
> From: Michael Dockery [mailto:dockeryjava...@yahoo.com]
> Sent: Monday, September 12, 2011 9:56 AM
> To: solr-user@lucene.apache.org
> Subject: Re: select query does not find indexed pdf document
> 
> http://www/SearchApp/select/?q=id:vpn
> 
> yeilds this:
>   
> - 
> - 
>   0
>   15
> - 
>   id:vpn
>   
>   
>   
>   
> 
> 
> *
> 
>  http://www/SearchApp/select/?q=*:*
> 
> yeilds this:
> 
>   
> - 
> - 
>   0
>   16
> - 
>   *.*
>   
>   
> - 
> - 
>   doc
> - 
>   application/pdf
>   
>   pdfy
>   2011-05-20T02:08:48Z
> - 
>   dmvpndeploy.pdf
>   
>   
>   
>   
> 
> 
> From: Jan Høydahl 
> To: solr-user@lucene.apache.org; Michael Dockery
> 
> Sent: Monday, September 12, 2011 4:59 AM
> Subject: Re: select query does not find indexed pdf document
> 
> Hi,
> 
> What do you get from a query http://www/SearchApp/select/?q=*:* or
> http://www/SearchApp/select/?q=id:vpn ?
> You may not have mapped the fields correctly to your schema?
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
> 
> On 12. sep. 2011, at 02:12, Michael Dockery wrote:
> 
> > I am new to solr.
> >
> > I tried to upload a pdf file via curl to my solr webapp (on tomcat)
> >
> > curl
> "http://www/SearchApp/update/extract?stream.file=c:\dmvpn.pdf&stream.co
> ntentType=application/pdf&literal.id=pdfy&commit=true"
> >
> >
> >
> > 
> > 
> > 0 name="QTime">860
> > 
> >
> >
> > but
> >
> > http://www/SearchApp/select/?q=vpn
> >
> >
> > does not find the document
> >
> >
> > 
> > 
> > 0
> > 0
> > 
> > vpn
> > 
> > 
> > 
> > 
> >
> >
> > help is appreciated.
> >
> > =
> > fyi
> > I point my test webapp to the index/solr home via mod meta-
> data/context.xml
> > 
> >     >  value="c:/solr_home" override="true" />
> >
> > and I had to copy all these jars to my webapp lib dir: (to avoid the
> classnotfound)
> > Solr_download\contrib\extraction\lib
> >  ...in the future i plan to put them in the tomcat/lib dir.
> >
> >
> > Also, I have not modified conf\solrconfig.xml or schema.xml.



Re: Problem with SolrJ and Grouping

2011-09-12 Thread Martijn v Groningen
The changes to that class were minor. There is only support for
parsing a grouped response. Check the QueryResponse class there is a
method getGroupResponse()
I ran into similar exceptions when creating the
QueryResponseTest#testGroupResponse test. The test use a xml response
from a file.

On 12 September 2011 15:38, Kirill Lykov  wrote:
> Martijn,
>
> I can't find the fixed version.
> I've got the last version of SolrJ but I see only minor changes in
> XMLResponseParser.java. And it doesn't support grouping yet. I also
> checked branch_3x, branch for 3.4.
>
> On Mon, Sep 12, 2011 at 5:45 PM, Martijn v Groningen
>  wrote:
>> Also the error you described when wt=xml and using SolrJ is also fixed
>> in 3.4 (and in trunk / branch3x).
>> You can wait for the 3.4 release of use a night 3x build.
>>
>> Martijn
>>
>> On 12 September 2011 12:41, Sanal K Stephen  wrote:
>>> Kirill,
>>>
>>>         Parsing the grouped result using SolrJ is not released yet I
>>> think..its going to release with Solr 3.4.0.SolrJ client cannot parse
>>> grouped and range facets results SOLR-2523.
>>>
>>> see the release notes of Solr 3.4.0
>>> http://wiki.apache.org/solr/ReleaseNote34
>>>
>>>
>>> On Mon, Sep 12, 2011 at 3:51 PM, Kirill Lykov wrote:
>>>
 I found that SolrQuery doesn’t work with grouping.
 I constructed SolrQuery this way:

 solrQuery = constructFullSearchQuery(searchParams);
 solrQuery.set("group", true);
 solrQuery.set("group.field", "GeonameId");

 Solr successfully handles request and writes about that in log:

 INFO: [] webapp=/solr path=/select

 params={start=1&q=*:*&timeAllowed=1500&group.field=GeonameId&group=true&wt=xml&rows=20&version=2.2}
 hits=12099579 status=0 QTime=2968

 The error occurs when SolrJ tries to parse
 XMLResponseParser.processResponse (line 324), where builder stores
 “”:

        Object val = type.read( builder.toString().trim() );
        if( val == null && type != KnownType.NULL) {
          throw new XMLStreamException( "error reading value:"+type,
 parser.getLocation() );
        }
        vals.add( val );
        break;

 The problem is - val is null. It happens because handler for the type
 LST returns null(line 178 in the same file):

 LST    (false) { @Override public Object read( String txt ) { return null;
 } },

 I don’t understand why it works this way. Xml which was returned by
 Solr is valid.
 If any I attached response xml to the letter. The error occures in the
 line 3, column 14 661.
 I use apache solr 3.3.0 and the same SolrJ.
 --
 Best regards,
 Kirill Lykov,
 Software Engineer,
 Data East LLC,
 tel.:+79133816052,
 LinkedIn profile: http://www.linkedin.com/pub/kirill-lykov/12/860/16

>>>
>>>
>>>
>>> --
>>> Regards,
>>> Sanal Kannappilly Stephen
>>>
>>
>>
>>
>> --
>> Met vriendelijke groet,
>>
>> Martijn van Groningen
>>
>
>
>
> --
> Best regards,
> Kirill Lykov,
> Software Engineer,
> Data East LLC,
> tel.:+79133816052,
> LinkedIn profile: http://www.linkedin.com/pub/kirill-lykov/12/860/16
>



-- 
Met vriendelijke groet,

Martijn van Groningen


Re: is it possibler to do a scheduling internally in solr application?

2011-09-12 Thread O. Klein
The easiest way is to use CRON and cURL for this.

So add somthing like curl
http://localhost:8080/solr/dataimport?command=full-import to your cron.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/is-it-possibler-to-do-a-scheduling-internally-in-solr-application-tp3329381p3329667.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: select query does not find indexed pdf document

2011-09-12 Thread Michael Dockery
http://www/SearchApp/select/?q=id:vpn 
 
yeilds this: 
   
- 
- 
  0 
  15 
- 
  id:vpn 
  
  
   
  
 
 
*
 
 http://www/SearchApp/select/?q=*:* 

yeilds this:
 
   
- 
- 
  0 
  16 
- 
  *.* 
  
  
- 
- 
  doc 
- 
  application/pdf 
  
  pdfy 
  2011-05-20T02:08:48Z 
- 
  dmvpndeploy.pdf 
  
  
  
  
 

From: Jan Høydahl 
To: solr-user@lucene.apache.org; Michael Dockery 
Sent: Monday, September 12, 2011 4:59 AM
Subject: Re: select query does not find indexed pdf document

Hi,

What do you get from a query http://www/SearchApp/select/?q=*:* or 
http://www/SearchApp/select/?q=id:vpn ?
You may not have mapped the fields correctly to your schema?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 12. sep. 2011, at 02:12, Michael Dockery wrote:

> I am new to solr.  
> 
> I tried to upload a pdf file via curl to my solr webapp (on tomcat)
> 
> curl 
> "http://www/SearchApp/update/extract?stream.file=c:\dmvpn.pdf&stream.contentType=application/pdf&literal.id=pdfy&commit=true";
> 
> 
> 
> 
> 
> 0 name="QTime">860
> 
> 
> 
> but
> 
> http://www/SearchApp/select/?q=vpn
> 
> 
> does not find the document
> 
> 
> 
> 
> 0
> 0
> 
> vpn
> 
> 
> 
> 
> 
> 
> help is appreciated.
> 
> =
> fyi
> I point my test webapp to the index/solr home via mod meta-data/context.xml
> 
>      value="c:/solr_home" override="true" />
> 
> and I had to copy all these jars to my webapp lib dir: (to avoid the 
> classnotfound)
> Solr_download\contrib\extraction\lib
>  ...in the future i plan to put them in the tomcat/lib dir.
> 
> 
> Also, I have not modified conf\solrconfig.xml or schema.xml.

Re: Problem with SolrJ and Grouping

2011-09-12 Thread Kirill Lykov
Martijn,

I can't find the fixed version.
I've got the last version of SolrJ but I see only minor changes in
XMLResponseParser.java. And it doesn't support grouping yet. I also
checked branch_3x, branch for 3.4.

On Mon, Sep 12, 2011 at 5:45 PM, Martijn v Groningen
 wrote:
> Also the error you described when wt=xml and using SolrJ is also fixed
> in 3.4 (and in trunk / branch3x).
> You can wait for the 3.4 release of use a night 3x build.
>
> Martijn
>
> On 12 September 2011 12:41, Sanal K Stephen  wrote:
>> Kirill,
>>
>>         Parsing the grouped result using SolrJ is not released yet I
>> think..its going to release with Solr 3.4.0.SolrJ client cannot parse
>> grouped and range facets results SOLR-2523.
>>
>> see the release notes of Solr 3.4.0
>> http://wiki.apache.org/solr/ReleaseNote34
>>
>>
>> On Mon, Sep 12, 2011 at 3:51 PM, Kirill Lykov wrote:
>>
>>> I found that SolrQuery doesn’t work with grouping.
>>> I constructed SolrQuery this way:
>>>
>>> solrQuery = constructFullSearchQuery(searchParams);
>>> solrQuery.set("group", true);
>>> solrQuery.set("group.field", "GeonameId");
>>>
>>> Solr successfully handles request and writes about that in log:
>>>
>>> INFO: [] webapp=/solr path=/select
>>>
>>> params={start=1&q=*:*&timeAllowed=1500&group.field=GeonameId&group=true&wt=xml&rows=20&version=2.2}
>>> hits=12099579 status=0 QTime=2968
>>>
>>> The error occurs when SolrJ tries to parse
>>> XMLResponseParser.processResponse (line 324), where builder stores
>>> “”:
>>>
>>>        Object val = type.read( builder.toString().trim() );
>>>        if( val == null && type != KnownType.NULL) {
>>>          throw new XMLStreamException( "error reading value:"+type,
>>> parser.getLocation() );
>>>        }
>>>        vals.add( val );
>>>        break;
>>>
>>> The problem is - val is null. It happens because handler for the type
>>> LST returns null(line 178 in the same file):
>>>
>>> LST    (false) { @Override public Object read( String txt ) { return null;
>>> } },
>>>
>>> I don’t understand why it works this way. Xml which was returned by
>>> Solr is valid.
>>> If any I attached response xml to the letter. The error occures in the
>>> line 3, column 14 661.
>>> I use apache solr 3.3.0 and the same SolrJ.
>>> --
>>> Best regards,
>>> Kirill Lykov,
>>> Software Engineer,
>>> Data East LLC,
>>> tel.:+79133816052,
>>> LinkedIn profile: http://www.linkedin.com/pub/kirill-lykov/12/860/16
>>>
>>
>>
>>
>> --
>> Regards,
>> Sanal Kannappilly Stephen
>>
>
>
>
> --
> Met vriendelijke groet,
>
> Martijn van Groningen
>



-- 
Best regards,
Kirill Lykov,
Software Engineer,
Data East LLC,
tel.:+79133816052,
LinkedIn profile: http://www.linkedin.com/pub/kirill-lykov/12/860/16


Re: Solandra - select query error

2011-09-12 Thread tom135
It's complecated to give you sample data. But this error depends on the size
of data. I have indexed 200 docs and this error did not occurred. But I need
much more (ie. 5 000 000), so if I try to index 2000 docs then come this
error.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solandra-select-query-error-tp3329423p3329613.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solandra - select query error

2011-09-12 Thread Jake Luciani
Hi,

Solandra specific issue should be raised on
http://github.com/tjake/Solandra/issues

Could you also provide some sample data ans schama I can try to reproduce
with?

Thanks,

Jake

On Mon, Sep 12, 2011 at 7:57 AM, tom135  wrote:

> Hello,
>
> I have some index and two search query:
> 1. http://127.0.0.1:8983/solandra/INDEX_NAME/select?q=type:(3 2 1) AND
> category:(2 1) AND text:(WORD1 WORD2 WORD3 WORD4
> WORD5)&facet.field=creation_date&facet=true&wt=javabin&version=2
>
> This query works good
>
> 2. 1. http://127.0.0.1:8983/solandra/INDEX_NAME/select?q=type:(2 1) AND
> category:(2 1) AND text:(WORD1 WORD2 WORD3 WORD4
> WORD5)&facet.field=creation_date&facet=true&wt=javabin&version=2
>
> This select throw an error:
> *===*
> *HTTP ERROR 500*
>
> Problem accessing /solandra/INDEX_NAME.proj1/select. Reason:
>
>null  java.lang.ArrayIndexOutOfBoundsException
>
> null  java.lang.ArrayIndexOutOfBoundsException
>
> request: http://127.0.0.1:8983/solandra/INDEX_NAME.proj1~3/select
>
> org.apache.solr.common.SolrException: null
> java.lang.ArrayIndexOutOfBoundsException
>
> null  java.lang.ArrayIndexOutOfBoundsException
>
> request: http://127.0.0.1:8983/solandra/INDEX_NAME.proj1~3/select
>at
>
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:435)
>at
>
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
>at
>
> org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:421)
>at
>
> org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:393)
>at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>at java.lang.Thread.run(Thread.java:662)
> *===*
>
> Thanks for any help!
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solandra-select-query-error-tp3329423p3329423.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
http://twitter.com/tjake


Solandra - select query error

2011-09-12 Thread tom135
Hello,

I have some index and two search query:
1. http://127.0.0.1:8983/solandra/INDEX_NAME/select?q=type:(3 2 1) AND
category:(2 1) AND text:(WORD1 WORD2 WORD3 WORD4
WORD5)&facet.field=creation_date&facet=true&wt=javabin&version=2

This query works good

2. 1. http://127.0.0.1:8983/solandra/INDEX_NAME/select?q=type:(2 1) AND
category:(2 1) AND text:(WORD1 WORD2 WORD3 WORD4
WORD5)&facet.field=creation_date&facet=true&wt=javabin&version=2

This select throw an error:
*===*
*HTTP ERROR 500*

Problem accessing /solandra/INDEX_NAME.proj1/select. Reason:

null  java.lang.ArrayIndexOutOfBoundsException

null  java.lang.ArrayIndexOutOfBoundsException

request: http://127.0.0.1:8983/solandra/INDEX_NAME.proj1~3/select

org.apache.solr.common.SolrException: null 
java.lang.ArrayIndexOutOfBoundsException

null  java.lang.ArrayIndexOutOfBoundsException

request: http://127.0.0.1:8983/solandra/INDEX_NAME.proj1~3/select
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:435)
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
at
org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:421)
at
org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:393)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
*===*

Thanks for any help!


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solandra-select-query-error-tp3329423p3329423.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr equivalent of "select distinct"

2011-09-12 Thread lee carroll
if you have a limited set of searches which need to use this and they
act on a limited known set of fields you can concat fields at index
time and then facet

PK   FLD1  FLD2FLD3 FLD4 FLD5 copy45
AB0  AB  0 x   yx y
AB1  AB  1 x   yx y
CD0  CD  0 a   ba b
CD1  CD  1 e   f e f

faceting on copy45 field would give you the correct "distinct" term
values (plus their counts).
Its pretty contrived and limited to knowing the fields you need to concat.

What is the use case for this ? it maybe another approach would fit better

lee c

On 11 September 2011 22:26, Michael Sokolov  wrote:
> You can get what you want - unique lists of values from docs matching your
> query - for a single field (using facets), but not for the co-occurrence of
> two field values.  So you could combine the two fields together, if you know
> what they are going to be "in advance."  Facets also give you counts, so in
> some special cases, you could get what you want - eg you can tell when there
> is only a single pair of values since their counts will be the same and the
> same as the total.  But that's all I can think of.
>
> -Mike
>
> On 9/11/2011 12:39 PM, Mark juszczec wrote:
>>
>> Here's an example:
>>
>> PK   FLD1      FLD2    FLD3 FLD4 FLD5
>> AB0  A            B          0     x       y
>> AB1  A            B          1     x       y
>> CD0  C            D          0     a       b
>> CD1  C            D          1     e       f
>>
>> I want to write a query using only the terms FLD1 and FLD2 and ONLY get
>> back:
>>
>> A B x y
>> C D a b
>> C D e f
>>
>> Since FLD4 and FLD5 are the same for PK=AB0 and AB1, I only want one
>> occurrence of those records.
>>
>> Since FLD4 and FLD5 are different for PK=CD0 and CD1, I want BOTH
>> occurrences of those records.
>>
>
>


is it possibler to do a scheduling internally in solr application?

2011-09-12 Thread vighnesh
hi all

i am unable to do scheduling in solr to execute the commands like
full-import and delta-import .and also

is it possibler to do a scheduling internally in solr application?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/is-it-possibler-to-do-a-scheduling-internally-in-solr-application-tp3329381p3329381.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Nested documents

2011-09-12 Thread Martijn v Groningen
To support this, we also need to implement indexing block of documents in Solr.
Basically the UpdateHandler should also use this method:
IndexWriter#addDocuments(Collection documents)

On 12 September 2011 01:01, Michael McCandless
 wrote:
> Even if it applies, this is for Lucene.  I don't think we've added
> Solr support for this yet... we should!
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Sun, Sep 11, 2011 at 12:16 PM, Erick Erickson
>  wrote:
>> Does this JIRA apply?
>>
>> https://issues.apache.org/jira/browse/LUCENE-3171
>>
>> Best
>> Erick
>>
>> On Sat, Sep 10, 2011 at 8:32 PM, Andy  wrote:
>>> Hi,
>>>
>>> Does Solr support nested documents? If not is there any plan to add such a 
>>> feature?
>>>
>>> Thanks.
>>
>



-- 
Met vriendelijke groet,

Martijn van Groningen


Fwd: How to serach on specific file types ?

2011-09-12 Thread ahmad ajiloo
Hello
I want to search on articles. So need to find only specific files like doc,
docx, and pdf.
I don't need any html pages. Thus the result of our search should only
consists of doc, docx, and pdf files.
can you help me?


Re: Problem with SolrJ and Grouping

2011-09-12 Thread Martijn v Groningen
Also the error you described when wt=xml and using SolrJ is also fixed
in 3.4 (and in trunk / branch3x).
You can wait for the 3.4 release of use a night 3x build.

Martijn

On 12 September 2011 12:41, Sanal K Stephen  wrote:
> Kirill,
>
>         Parsing the grouped result using SolrJ is not released yet I
> think..its going to release with Solr 3.4.0.SolrJ client cannot parse
> grouped and range facets results SOLR-2523.
>
> see the release notes of Solr 3.4.0
> http://wiki.apache.org/solr/ReleaseNote34
>
>
> On Mon, Sep 12, 2011 at 3:51 PM, Kirill Lykov wrote:
>
>> I found that SolrQuery doesn’t work with grouping.
>> I constructed SolrQuery this way:
>>
>> solrQuery = constructFullSearchQuery(searchParams);
>> solrQuery.set("group", true);
>> solrQuery.set("group.field", "GeonameId");
>>
>> Solr successfully handles request and writes about that in log:
>>
>> INFO: [] webapp=/solr path=/select
>>
>> params={start=1&q=*:*&timeAllowed=1500&group.field=GeonameId&group=true&wt=xml&rows=20&version=2.2}
>> hits=12099579 status=0 QTime=2968
>>
>> The error occurs when SolrJ tries to parse
>> XMLResponseParser.processResponse (line 324), where builder stores
>> “”:
>>
>>        Object val = type.read( builder.toString().trim() );
>>        if( val == null && type != KnownType.NULL) {
>>          throw new XMLStreamException( "error reading value:"+type,
>> parser.getLocation() );
>>        }
>>        vals.add( val );
>>        break;
>>
>> The problem is - val is null. It happens because handler for the type
>> LST returns null(line 178 in the same file):
>>
>> LST    (false) { @Override public Object read( String txt ) { return null;
>> } },
>>
>> I don’t understand why it works this way. Xml which was returned by
>> Solr is valid.
>> If any I attached response xml to the letter. The error occures in the
>> line 3, column 14 661.
>> I use apache solr 3.3.0 and the same SolrJ.
>> --
>> Best regards,
>> Kirill Lykov,
>> Software Engineer,
>> Data East LLC,
>> tel.:+79133816052,
>> LinkedIn profile: http://www.linkedin.com/pub/kirill-lykov/12/860/16
>>
>
>
>
> --
> Regards,
> Sanal Kannappilly Stephen
>



-- 
Met vriendelijke groet,

Martijn van Groningen


Re: Problem with SolrJ and Grouping

2011-09-12 Thread Sanal K Stephen
Kirill,

 Parsing the grouped result using SolrJ is not released yet I
think..its going to release with Solr 3.4.0.SolrJ client cannot parse
grouped and range facets results SOLR-2523.

see the release notes of Solr 3.4.0
http://wiki.apache.org/solr/ReleaseNote34


On Mon, Sep 12, 2011 at 3:51 PM, Kirill Lykov wrote:

> I found that SolrQuery doesn’t work with grouping.
> I constructed SolrQuery this way:
>
> solrQuery = constructFullSearchQuery(searchParams);
> solrQuery.set("group", true);
> solrQuery.set("group.field", "GeonameId");
>
> Solr successfully handles request and writes about that in log:
>
> INFO: [] webapp=/solr path=/select
>
> params={start=1&q=*:*&timeAllowed=1500&group.field=GeonameId&group=true&wt=xml&rows=20&version=2.2}
> hits=12099579 status=0 QTime=2968
>
> The error occurs when SolrJ tries to parse
> XMLResponseParser.processResponse (line 324), where builder stores
> “”:
>
>Object val = type.read( builder.toString().trim() );
>if( val == null && type != KnownType.NULL) {
>  throw new XMLStreamException( "error reading value:"+type,
> parser.getLocation() );
>}
>vals.add( val );
>break;
>
> The problem is - val is null. It happens because handler for the type
> LST returns null(line 178 in the same file):
>
> LST(false) { @Override public Object read( String txt ) { return null;
> } },
>
> I don’t understand why it works this way. Xml which was returned by
> Solr is valid.
> If any I attached response xml to the letter. The error occures in the
> line 3, column 14 661.
> I use apache solr 3.3.0 and the same SolrJ.
> --
> Best regards,
> Kirill Lykov,
> Software Engineer,
> Data East LLC,
> tel.:+79133816052,
> LinkedIn profile: http://www.linkedin.com/pub/kirill-lykov/12/860/16
>



-- 
Regards,
Sanal Kannappilly Stephen


Problem with SolrJ and Grouping

2011-09-12 Thread Kirill Lykov
I found that SolrQuery doesn’t work with grouping.
I constructed SolrQuery this way:

solrQuery = constructFullSearchQuery(searchParams);
solrQuery.set("group", true);
solrQuery.set("group.field", "GeonameId");

Solr successfully handles request and writes about that in log:

INFO: [] webapp=/solr path=/select
params={start=1&q=*:*&timeAllowed=1500&group.field=GeonameId&group=true&wt=xml&rows=20&version=2.2}
hits=12099579 status=0 QTime=2968

The error occurs when SolrJ tries to parse
XMLResponseParser.processResponse (line 324), where builder stores
“”:

Object val = type.read( builder.toString().trim() );
if( val == null && type != KnownType.NULL) {
  throw new XMLStreamException( "error reading value:"+type,
parser.getLocation() );
}
vals.add( val );
break;

The problem is - val is null. It happens because handler for the type
LST returns null(line 178 in the same file):

LST(false) { @Override public Object read( String txt ) { return null; } },

I don’t understand why it works this way. Xml which was returned by
Solr is valid.
If any I attached response xml to the letter. The error occures in the
line 3, column 14 661.
I use apache solr 3.3.0 and the same SolrJ.
-- 
Best regards,
Kirill Lykov,
Software Engineer,
Data East LLC,
tel.:+79133816052,
LinkedIn profile: http://www.linkedin.com/pub/kirill-lykov/12/860/16
 
 
0621name:hi searchField:hi1500GeonameIdtruexml202.24053324704082470408Haute Steppe, region, Al Qaşrayn, Tunisia 538779935.259.0High Steppes 64.0Africa; TN; Tunisia; 02; Al Qaşrayn; ; ; High Steppes; Haute Steppe, region, Al Qaşrayn, Tunisia 53877999.0058.99535.25535.24515724121572412Ngọc Nh, Vietnam 370427421.1833105.35Ngoc Hi 29.0Asia; VN; Vietnam; 16; ; ; ; Ngoc Hi; Ngọc Nh, Vietnam 3704274105.36105.3421.193321.173315738011573801Na Hi, Bắc Thái Tỉnh, Vietnam 370681222.5106.017Na Hi 29.0Asia; VN; Vietnam; 02; Bắc Thái Tỉnh; ; ; Na Hi; Na Hi, Bắc Thái Tỉnh, Vietnam 3706812106.027106.00722.5122.4915759551575955Ðộng Li Hi, hill, Cao Bang, Vietnam 371132116.25107.7Dong Li Hi 64.0Asia; VN; Vietnam; 04; Cao Bang; ; ; Dong Li Hi; Ðộng Li Hi, hill, Cao Bang, Vietnam 3711321107.705107.69516.25516.24515773011577301Núi Lang Hi, hill, Bắc Thái Tỉnh, Vietnam 371427721.6333105.883Nui Lang Hi 64.0Asia; VN; Vietnam; 02; Bắc Thái Tỉnh; ; ; Nui Lang Hi; Núi Lang Hi, hill, Bắc Thái Tỉnh, Vietnam 3714277105.888105.87821.638321.628315807951580795Hòa Hy, Hải Phòng, Vietnam 372150220.8106.9Hoa-Hi 29.0Asia; VN; Vietnam; 13; Hải Phòng; ; ; Hoa-Hi; Hòa Hy, Hải Phòng, Vietnam 3721502106.91106.8920.8120.7915809491580949Nam Hi, stream, Lai Châu, Vietnam 372184622.15103.167Nam Hi 80.0Asia; VN; Vietnam; 89; Lai Châu; ; ; Nam Hi; Nam Hi, stream, Lai Châu, Vietnam 3721846103.172103.16222.15522.14515809501580950Nam Hi, stream, Thua Thien-Hue, Vietnam 372184719.5104.483Houei Hi 80.0Asia; VN; Vietnam; 26; Thua Thien-Hue; ; ; Houei Hi; Nam Hi, stream, Thua Thien-Hue, Vietnam 3721847104.488104.47819.50519.49515811201581120Hao Chu Hi, peak, Nin Thuan, Vietnam 372221611.7833109.0Hao Chu Hi 64.0Asia; VN; Vietnam; 36; Nin Thuan; ; ; Hao Chu Hi; Hao Chu Hi, peak, Nin Thuan, Vietnam 3722216109.005108.99511.788311.778315820461582046Rocher Élevé, rock, Nin Thuan, Vietnam 372440910.6167108.9High Rock 64.0Asia; VN; Vietnam; 36; Nin Thuan; ; ; High Rock; Rocher Élevé, rock, Nin Thuan, Vietnam 3724409108.905108.89510.621710.611715855951585595Chay Hi Ho, Vietnam 373225621.85103.95Chay Hi Ho 29.0Asia; VN; Vietnam; 19; ; ; ; Chay Hi Ho; Chay Hi Ho, Vietnam 3732256103.96103.9421.8621.8434900693490069High Peak, peak, Portland, Jamaica 717024318.0333-76.3667High Peak 64.0North America; JM; Jamaica; 07; Portland; ; ; High Peak; High Peak, peak, Portland, Jamaica 7170243-76.3617-76.371718.038318.028334900703490070High Peak, mountain, Portland, Jamaica 717024418.0833-76.6167High Peak 64.0North America; JM; Jamaica; 07; Portland; ; ; High Peak; High Peak, mountain, Portland, Jamaica 7170244-76.6117-76.621718.088318.078334900713490071High Peak, mountain, Portland, Jamaica 717024518.0667-76.5667High Peak 64.0North America; JM; Jamaica; 07; Portland; ; ; High Peak; High Peak, mountain, Portland, Jamaica 7170245-76.5617-76.571718.071718.061734900733490073High Hill, hill, Saint Ann, Jamaica 717024718.3167-77.2167High Hill 64.0North America; JM; Jamaica; 09; Saint Ann; ; ; High Hill; High Hill, hill, Saint Ann, Jamaica 7170247-77.2117-77.221718.321718.311734900743490074High Hill, hill, Saint Thomas, Jamaica 717024818.0333-76.5833High Hill 64.0North America; JM; Jamaica; 14; Saint Thomas; ; ; High Hill; High Hill, hill, Saint Thomas, Jamaica 7170248-76.5783-76.588318.038318.028333668273366827High Riding, Western Cape, South Africa 6953941-34.118.3833High Riding 29.0Africa; ZA; South Africa; 11; Western Cape; ; ; High Riding; High Riding, Western Cape, South Africa 695394118.393318.3733-34.09-34.1134246903424690Alfefjeldene, mountains, Vestgrønland, Greenland 705022566.3667-53.3833High Alpland 64.0Nort

Re: Solr and DateTimes - bug?

2011-09-12 Thread Nicklas Overgaard
I see. I'm using that date to flag that my entity "has not yet ended". I 
can just use another constant which Solr is capable of returning in the 
correct format. The nice thing about DateTime.MinValue is that it's just 
part of the .net framework :)


Hope that the issue is resolved at some point.

I'm wondering if it would be possible for you (or someone else) to fix 
the issue with years from 1 to 999 being formatted incorrectly, and then 
creating a new ticket for the issue with negative years?


Best regards,

Nicklas

On 2011-09-12 07:02, Chris Hostetter wrote:

: The XML output when performing a query via the solr interface is like this:
:1-01-01T00:00:00Z

i think you mean:1-01-01T00:00:00Z

:>  >  So my question is: Is this a bug in the solr output engine, or should 
mono
:>  >  be able to parse the date as given from solr? I have not yet tried it out
:>  >  on .net as I do not have access to a windows machine at the moment.

it is in fact a bug in Solr that not a lot of people have been overly
concerned with some most people don't deal with dates that far back

https://issues.apache.org/jira/browse/SOLR-1899

...I spent a little time working on it at one point but got side tracked
by other things since there are a coupld of related issues with the
canonical iso8601 date format arround year "0" that made it non obvious
what hte "ideal" solution was.

-Hoss




Re: Document row in solr Result

2011-09-12 Thread Eric Grobler
Hi Pierre,

Great idea, that will speed things up!

Thank your very much.

Regards
Ericz


On Mon, Sep 12, 2011 at 10:19 AM, Pierre GOSSE wrote:

> Hi Eric,
>
> If you want a query informing one customer of its product row at any given
> time, the easiest way is to filter on submission date greater than this
> customer's and return the result count. If you have 500 products with an
> earlier submission date, your row number is 501.
>
> Hope this helps,
>
> Pierre
>
>
> -Message d'origine-
> De : Eric Grobler [mailto:impalah...@googlemail.com]
> Envoyé : lundi 12 septembre 2011 11:00
> À : solr-user@lucene.apache.org
> Objet : Re: Document row in solr Result
>
> Hi Manish,
>
> Thank you for your time.
>
> For upselling reasons I want to inform the customer that:
> "your product is on the last page of the search result. However, click here
> to put your product back on the first page..."
>
>
> Here is an example:
> I have a phone with productid 635001 in the iphone category.
> When I sort this category by submissiondate this product will be near the
> end of the result (on row 9863 in this example).
> At the moment I have to scan nearly 1 rows in the client to determine
> the position of this product.
> Is there a more efficient way to find the position of a specific document
> in
> a resultset without returning the full result?
>
> q=category:iphone
> fl=productid
> sort=submissiondate desc
> rows=1
>
>  row productid submissiondate
>   1 6565692011-09-12 08:12
>   2 6564682011-09-12 08:03
>   3 6562012011-09-11 23:41
> ...
> 9863 6350012011-08-11 17:22
> ...
> 9922 6344232011-08-10 21:51
>
> Regards
> Ericz
>
> On Mon, Sep 12, 2011 at 9:38 AM, Manish Bafna  >wrote:
>
> > You might not be able to find the row index.
> > Can you post your query in detail. The kind of inputs and outputs you are
> > expecting.
> >
> > On Mon, Sep 12, 2011 at 2:01 PM, Eric Grobler  > >wrote:
> >
> > > Hi Manish,
> > >
> > > Thanks for your reply - but how will that return me the row index of
> the
> > > original query.
> > >
> > > Regards
> > > Ericz
> > >
> > > On Mon, Sep 12, 2011 at 9:24 AM, Manish Bafna <
> manish.bafna...@gmail.com
> > > >wrote:
> > >
> > > > fq -> filter query parameter searches within the results.
> > > >
> > > > On Mon, Sep 12, 2011 at 1:49 PM, Eric Grobler <
> > impalah...@googlemail.com
> > > > >wrote:
> > > >
> > > > > Hi Solr experts,
> > > > >
> > > > > If you have a site with products sorted by submission date, the
> > product
> > > > of
> > > > > a
> > > > > customer might be on page 1 on the first day, and then move down to
> > > page
> > > > x
> > > > > as other customers submit newer entries.
> > > > >
> > > > > To find the row of a product you can of course run the query and
> loop
> > > > > through the result until you find the specific productid like:
> > > > > q=category:myproducttype
> > > > > fl=productid
> > > > > sort=submissiondate desc
> > > > > rows=1
> > > > >
> > > > > But is there perhaps a more efficient way to do this? Maybe a
> special
> > > > > syntax
> > > > > to search within the result.
> > > > >
> > > > > Thanks
> > > > > Ericz
> > > > >
> > > >
> > >
> >
>


Re: OOM issue

2011-09-12 Thread Manish Bafna
Number of cache is definitely going to reduce heap usage.

Can you run those xlsx file separately with Tika and see if you are getting
OOM issue.

On Mon, Sep 12, 2011 at 3:09 PM, abhijit bashetti  wrote:

> I am facing the OOM issue.
>
> OTHER than increasing the RAM , Can we chnage some other parameters to
> avoid the OOM issue.
>
>
> such as minimizing the filter cache size , document cache size etc.
>
> Can you suggest me some other option to avoid the OOM issue?
>
>
> Thanks in advance!
>
>
> Regards,
>
> Abhijit
>


OOM issue

2011-09-12 Thread abhijit bashetti
I am facing the OOM issue.

OTHER than increasing the RAM , Can we chnage some other parameters to
avoid the OOM issue.


such as minimizing the filter cache size , document cache size etc.

Can you suggest me some other option to avoid the OOM issue?


Thanks in advance!


Regards,

Abhijit


Re: Stemming and other tokenizers

2011-09-12 Thread Manish Bafna
What is single document has multiple languages?

On Mon, Sep 12, 2011 at 2:23 PM, Jan Høydahl  wrote:

> Hi
>
> Everybody else use dedicated field per language, so why can't you?
> Please explain your use case, and perhaps we can better help understand
> what you're trying to do.
> Do you always know the query language in advance?
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
>
> On 12. sep. 2011, at 08:28, Patrick Sauts wrote:
>
> > I can't create one field per language, that is the problem but I'll dig
> into
> > it following your indications.
> > I let you know what I could come out with.
> >
> > Patrick.
> >
> > 2011/9/11 Jan Høydahl 
> >
> >> Hi,
> >>
> >> You'll not be able to detect language and change stemmer on the same
> field
> >> in one go. You need to create one fieldType in your schema per language
> you
> >> want to use, and then use LanguageIdentification (SOLR-1979) to do the
> magic
> >> of detecting language and renaming the field. If you set
> >> langid.override=false, languid.map=true and populate your "language"
> field
> >> with the known language, you will probably get the desired effect.
> >>
> >> --
> >> Jan Høydahl, search solution architect
> >> Cominvent AS - www.cominvent.com
> >> Solr Training - www.solrtraining.com
> >>
> >> On 10. sep. 2011, at 03:24, Patrick Sauts wrote:
> >>
> >>> Hello,
> >>>
> >>>
> >>>
> >>> I want to implement some king of AutoStemming that will detect the
> >> language
> >>> of a field based on a tag at the start of this field like #en# my field
> >> is
> >>> stored on disc but I don't want this tag to be stored. Is there a way
> to
> >>> avoid this field to be stored ?
> >>>
> >>> To me all the filters and the tokenizers interact only with the indexed
> >>> field and not the stored one.
> >>>
> >>> Am I wrong ?
> >>>
> >>> Is it possible to you to do such a filter.
> >>>
> >>>
> >>>
> >>> Patrick.
> >>>
> >>
> >>
>
>


RE: Document row in solr Result

2011-09-12 Thread Pierre GOSSE
Hi Eric,

If you want a query informing one customer of its product row at any given 
time, the easiest way is to filter on submission date greater than this 
customer's and return the result count. If you have 500 products with an 
earlier submission date, your row number is 501.

Hope this helps,

Pierre


-Message d'origine-
De : Eric Grobler [mailto:impalah...@googlemail.com] 
Envoyé : lundi 12 septembre 2011 11:00
À : solr-user@lucene.apache.org
Objet : Re: Document row in solr Result

Hi Manish,

Thank you for your time.

For upselling reasons I want to inform the customer that:
"your product is on the last page of the search result. However, click here
to put your product back on the first page..."


Here is an example:
I have a phone with productid 635001 in the iphone category.
When I sort this category by submissiondate this product will be near the
end of the result (on row 9863 in this example).
At the moment I have to scan nearly 1 rows in the client to determine
the position of this product.
Is there a more efficient way to find the position of a specific document in
a resultset without returning the full result?

q=category:iphone
fl=productid
sort=submissiondate desc
rows=1

 row productid submissiondate
   1 6565692011-09-12 08:12
   2 6564682011-09-12 08:03
   3 6562012011-09-11 23:41
...
9863 6350012011-08-11 17:22
...
9922 6344232011-08-10 21:51

Regards
Ericz

On Mon, Sep 12, 2011 at 9:38 AM, Manish Bafna wrote:

> You might not be able to find the row index.
> Can you post your query in detail. The kind of inputs and outputs you are
> expecting.
>
> On Mon, Sep 12, 2011 at 2:01 PM, Eric Grobler  >wrote:
>
> > Hi Manish,
> >
> > Thanks for your reply - but how will that return me the row index of the
> > original query.
> >
> > Regards
> > Ericz
> >
> > On Mon, Sep 12, 2011 at 9:24 AM, Manish Bafna  > >wrote:
> >
> > > fq -> filter query parameter searches within the results.
> > >
> > > On Mon, Sep 12, 2011 at 1:49 PM, Eric Grobler <
> impalah...@googlemail.com
> > > >wrote:
> > >
> > > > Hi Solr experts,
> > > >
> > > > If you have a site with products sorted by submission date, the
> product
> > > of
> > > > a
> > > > customer might be on page 1 on the first day, and then move down to
> > page
> > > x
> > > > as other customers submit newer entries.
> > > >
> > > > To find the row of a product you can of course run the query and loop
> > > > through the result until you find the specific productid like:
> > > > q=category:myproducttype
> > > > fl=productid
> > > > sort=submissiondate desc
> > > > rows=1
> > > >
> > > > But is there perhaps a more efficient way to do this? Maybe a special
> > > > syntax
> > > > to search within the result.
> > > >
> > > > Thanks
> > > > Ericz
> > > >
> > >
> >
>


Re: Document row in solr Result

2011-09-12 Thread Eric Grobler
Hi Manish,

Thank you for your time.

For upselling reasons I want to inform the customer that:
"your product is on the last page of the search result. However, click here
to put your product back on the first page..."


Here is an example:
I have a phone with productid 635001 in the iphone category.
When I sort this category by submissiondate this product will be near the
end of the result (on row 9863 in this example).
At the moment I have to scan nearly 1 rows in the client to determine
the position of this product.
Is there a more efficient way to find the position of a specific document in
a resultset without returning the full result?

q=category:iphone
fl=productid
sort=submissiondate desc
rows=1

 row productid submissiondate
   1 6565692011-09-12 08:12
   2 6564682011-09-12 08:03
   3 6562012011-09-11 23:41
...
9863 6350012011-08-11 17:22
...
9922 6344232011-08-10 21:51

Regards
Ericz

On Mon, Sep 12, 2011 at 9:38 AM, Manish Bafna wrote:

> You might not be able to find the row index.
> Can you post your query in detail. The kind of inputs and outputs you are
> expecting.
>
> On Mon, Sep 12, 2011 at 2:01 PM, Eric Grobler  >wrote:
>
> > Hi Manish,
> >
> > Thanks for your reply - but how will that return me the row index of the
> > original query.
> >
> > Regards
> > Ericz
> >
> > On Mon, Sep 12, 2011 at 9:24 AM, Manish Bafna  > >wrote:
> >
> > > fq -> filter query parameter searches within the results.
> > >
> > > On Mon, Sep 12, 2011 at 1:49 PM, Eric Grobler <
> impalah...@googlemail.com
> > > >wrote:
> > >
> > > > Hi Solr experts,
> > > >
> > > > If you have a site with products sorted by submission date, the
> product
> > > of
> > > > a
> > > > customer might be on page 1 on the first day, and then move down to
> > page
> > > x
> > > > as other customers submit newer entries.
> > > >
> > > > To find the row of a product you can of course run the query and loop
> > > > through the result until you find the specific productid like:
> > > > q=category:myproducttype
> > > > fl=productid
> > > > sort=submissiondate desc
> > > > rows=1
> > > >
> > > > But is there perhaps a more efficient way to do this? Maybe a special
> > > > syntax
> > > > to search within the result.
> > > >
> > > > Thanks
> > > > Ericz
> > > >
> > >
> >
>


Re: select query does not find indexed pdf document

2011-09-12 Thread Jan Høydahl
Hi,

What do you get from a query http://www/SearchApp/select/?q=*:* or 
http://www/SearchApp/select/?q=id:vpn ?
You may not have mapped the fields correctly to your schema?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 12. sep. 2011, at 02:12, Michael Dockery wrote:

> I am new to solr.  
> 
> I tried to upload a pdf file via curl to my solr webapp (on tomcat)
> 
> curl 
> "http://www/SearchApp/update/extract?stream.file=c:\dmvpn.pdf&stream.contentType=application/pdf&literal.id=pdf&commit=true";
> 
> 
> 
> 
> 
> 0 name="QTime">860
> 
> 
> 
> but
> 
> http://www/SearchApp/select/?q=vpn
> 
> 
> does not find the document
> 
> 
> 
> 
> 0
> 0
> 
> vpn
> 
> 
> 
> 
> 
> 
> help is appreciated.
> 
> =
> fyi
> I point my test webapp to the index/solr home via mod meta-data/context.xml
> 
>   value="c:/solr_home" override="true" />
> 
> and I had to copy all these jars to my webapp lib dir: (to avoid the 
> classnotfound)
> Solr_download\contrib\extraction\lib
>   ...in the future i plan to put them in the tomcat/lib dir.
> 
> 
> Also, I have not modified conf\solrconfig.xml or schema.xml.



Re: Stemming and other tokenizers

2011-09-12 Thread Jan Høydahl
Hi

Everybody else use dedicated field per language, so why can't you?
Please explain your use case, and perhaps we can better help understand what 
you're trying to do.
Do you always know the query language in advance?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 12. sep. 2011, at 08:28, Patrick Sauts wrote:

> I can't create one field per language, that is the problem but I'll dig into
> it following your indications.
> I let you know what I could come out with.
> 
> Patrick.
> 
> 2011/9/11 Jan Høydahl 
> 
>> Hi,
>> 
>> You'll not be able to detect language and change stemmer on the same field
>> in one go. You need to create one fieldType in your schema per language you
>> want to use, and then use LanguageIdentification (SOLR-1979) to do the magic
>> of detecting language and renaming the field. If you set
>> langid.override=false, languid.map=true and populate your "language" field
>> with the known language, you will probably get the desired effect.
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> Solr Training - www.solrtraining.com
>> 
>> On 10. sep. 2011, at 03:24, Patrick Sauts wrote:
>> 
>>> Hello,
>>> 
>>> 
>>> 
>>> I want to implement some king of AutoStemming that will detect the
>> language
>>> of a field based on a tag at the start of this field like #en# my field
>> is
>>> stored on disc but I don't want this tag to be stored. Is there a way to
>>> avoid this field to be stored ?
>>> 
>>> To me all the filters and the tokenizers interact only with the indexed
>>> field and not the stored one.
>>> 
>>> Am I wrong ?
>>> 
>>> Is it possible to you to do such a filter.
>>> 
>>> 
>>> 
>>> Patrick.
>>> 
>> 
>> 



Re: Document row in solr Result

2011-09-12 Thread Manish Bafna
You might not be able to find the row index.
Can you post your query in detail. The kind of inputs and outputs you are
expecting.

On Mon, Sep 12, 2011 at 2:01 PM, Eric Grobler wrote:

> Hi Manish,
>
> Thanks for your reply - but how will that return me the row index of the
> original query.
>
> Regards
> Ericz
>
> On Mon, Sep 12, 2011 at 9:24 AM, Manish Bafna  >wrote:
>
> > fq -> filter query parameter searches within the results.
> >
> > On Mon, Sep 12, 2011 at 1:49 PM, Eric Grobler  > >wrote:
> >
> > > Hi Solr experts,
> > >
> > > If you have a site with products sorted by submission date, the product
> > of
> > > a
> > > customer might be on page 1 on the first day, and then move down to
> page
> > x
> > > as other customers submit newer entries.
> > >
> > > To find the row of a product you can of course run the query and loop
> > > through the result until you find the specific productid like:
> > > q=category:myproducttype
> > > fl=productid
> > > sort=submissiondate desc
> > > rows=1
> > >
> > > But is there perhaps a more efficient way to do this? Maybe a special
> > > syntax
> > > to search within the result.
> > >
> > > Thanks
> > > Ericz
> > >
> >
>


Re: Document row in solr Result

2011-09-12 Thread Eric Grobler
Hi Manish,

Thanks for your reply - but how will that return me the row index of the
original query.

Regards
Ericz

On Mon, Sep 12, 2011 at 9:24 AM, Manish Bafna wrote:

> fq -> filter query parameter searches within the results.
>
> On Mon, Sep 12, 2011 at 1:49 PM, Eric Grobler  >wrote:
>
> > Hi Solr experts,
> >
> > If you have a site with products sorted by submission date, the product
> of
> > a
> > customer might be on page 1 on the first day, and then move down to page
> x
> > as other customers submit newer entries.
> >
> > To find the row of a product you can of course run the query and loop
> > through the result until you find the specific productid like:
> > q=category:myproducttype
> > fl=productid
> > sort=submissiondate desc
> > rows=1
> >
> > But is there perhaps a more efficient way to do this? Maybe a special
> > syntax
> > to search within the result.
> >
> > Thanks
> > Ericz
> >
>


Re: Document row in solr Result

2011-09-12 Thread Manish Bafna
fq -> filter query parameter searches within the results.

On Mon, Sep 12, 2011 at 1:49 PM, Eric Grobler wrote:

> Hi Solr experts,
>
> If you have a site with products sorted by submission date, the product of
> a
> customer might be on page 1 on the first day, and then move down to page x
> as other customers submit newer entries.
>
> To find the row of a product you can of course run the query and loop
> through the result until you find the specific productid like:
> q=category:myproducttype
> fl=productid
> sort=submissiondate desc
> rows=1
>
> But is there perhaps a more efficient way to do this? Maybe a special
> syntax
> to search within the result.
>
> Thanks
> Ericz
>


Document row in solr Result

2011-09-12 Thread Eric Grobler
Hi Solr experts,

If you have a site with products sorted by submission date, the product of a
customer might be on page 1 on the first day, and then move down to page x
as other customers submit newer entries.

To find the row of a product you can of course run the query and loop
through the result until you find the specific productid like:
q=category:myproducttype
fl=productid
sort=submissiondate desc
rows=1

But is there perhaps a more efficient way to do this? Maybe a special syntax
to search within the result.

Thanks
Ericz


Re:OOM issue

2011-09-12 Thread abhijit bashetti
Yes , I am using TIKA for content extraction.

The xlsx file size is 25MB. IS there any other option to relsolve the
OOM issue rather than increasing the RAM.


Can we chnage some other configuration param of solr to avoid OOM issue?



Are you using Tika to do the extraction of content?
You might be getting OOM because of huge xlsx file.

Try having bigger RAM and you might not get the issue.

On Mon, Sep 12, 2011 at 12:44 PM, abhijit bashetti
 wrote:

> Hi,
>
> I am getting the OOM error.
>
> I am working with multi-core for solr . I am using DIH for indexing. I have
> also integrated TIKA for content extraction.
>
> I am using ORACLE 10g DB.
>
> In the solrconfig.xml , I have added
>
>  size="512"
> initialSize="512"
> autowarmCount="0"/>
>
>  size="512"
> initialSize="512"
> autowarmCount="0"/>
>
>size="512"
>   initialSize="512"
>   autowarmCount="0"/>
>
>
>  native
>
>
> My indexing server is on linux with 8GB of ram.
> I am indexing huge document set. 10 cores are there. every core has 300 000
> documents.
>
> I got the OOM error for a xlsx document which is of 25MB size.
>
> On the Indexing server , I am doing indexing (first time indexing for a new
> core added) , re-indexing and searching also.
>
> Do I need to create multiple solr webapps to resolve the issues.
>
> Or I need add more RAM to the system so as to avoid OOM.
>
> Regards,
> Abhijit


Re: Running solr on small amounts of RAM

2011-09-12 Thread Toke Eskildsen
On Fri, 2011-09-09 at 18:48 +0200, Mike Austin wrote:
> Our index is very small with 100k documents and a light load at the moment.
> If I wanted to use the smallest possible RAM on the server, how would I do
> this and what are the issues?

The index size depends just as much on the size of the documents as the
number, but assuming that your documents are relatively small, I don't
see any issues. 100K is such a small amount that you will get fair OS
caching even on a very low memory server.

Plain searches works well with low memory, but faceting might be tricky
as it requires a temporary memory overhead for each concurrent search.
Limiting the number of concurrent searches to 1 or 2 might be a good
idea.

During tests of hierarchical faceting with Solr trunk, I tried running
with 32MB for a very simple 1 million document index and it worked
surprisingly well (better with 48MB though). For stable Solr I would
expect the memory requirement to be somewhat higher.


Could you tell us what you're aiming at? What is low memory for you, how
large is your index in bytes, what response times do you hope for and
what is the expected query rate?



Re: OOM issue

2011-09-12 Thread Manish Bafna
Are you using Tika to do the extraction of content?
You might be getting OOM because of huge xlsx file.

Try having bigger RAM and you might not get the issue.

On Mon, Sep 12, 2011 at 12:44 PM, abhijit bashetti <
abhijitbashe...@gmail.com> wrote:

> Hi,
>
> I am getting the OOM error.
>
> I am working with multi-core for solr . I am using DIH for indexing. I have
> also integrated TIKA for content extraction.
>
> I am using ORACLE 10g DB.
>
> In the solrconfig.xml , I have added
>
>  size="512"
> initialSize="512"
> autowarmCount="0"/>
>
>  size="512"
> initialSize="512"
> autowarmCount="0"/>
>
>size="512"
>   initialSize="512"
>   autowarmCount="0"/>
>
>
>  native
>
>
> My indexing server is on linux with 8GB of ram.
> I am indexing huge document set. 10 cores are there. every core has 300 000
> documents.
>
> I got the OOM error for a xlsx document which is of 25MB size.
>
> On the Indexing server , I am doing indexing (first time indexing for a new
> core added) , re-indexing and searching also.
>
> Do I need to create multiple solr webapps to resolve the issues.
>
> Or I need add more RAM to the system so as to avoid OOM.
>
> Regards,
> Abhijit
>


OOM issue

2011-09-12 Thread abhijit bashetti
Hi,

I am getting the OOM error.

I am working with multi-core for solr . I am using DIH for indexing. I have
also integrated TIKA for content extraction.

I am using ORACLE 10g DB.

In the solrconfig.xml , I have added








 native


My indexing server is on linux with 8GB of ram.
I am indexing huge document set. 10 cores are there. every core has 300 000
documents.

I got the OOM error for a xlsx document which is of 25MB size.

On the Indexing server , I am doing indexing (first time indexing for a new
core added) , re-indexing and searching also.

Do I need to create multiple solr webapps to resolve the issues.

Or I need add more RAM to the system so as to avoid OOM.

Regards,
Abhijit


Re: Will Solr/Lucene crawl multi websites (aka a mini google with faceted search)?

2011-09-12 Thread dpt9876
Thankyou for the clarification and help guys I will try them.
On Sep 12, 2011 10:29 AM, "kkrugler [via Lucene]" <
ml-node+s472066n332847...@n3.nabble.com> wrote:
>
>
>
> On Sep 11, 2011, at 7:04pm, dpt9876 wrote:
>
>> Hi thanks for the reply.
>>
>> How does nutch/solr handle the scenario where 1 website calls price,
"price"
>> and another website calls it "cost". Same thing different name, yet I
would
>> want the facet to handle that and not create a different facet.
>>
>> Is this combo of nutch and Solr that intelligent and or intuitive?
>
> What you're describing here is web mining, not web crawling.
>
> You want to extract price data from web pages, and put that into a
specific field in Solr.
>
> To do that using Nutch, you'd need to write custom plug-ins that know how
to extract the price from a page, and add that as a custom field to the
crawl results.
>
> The above is a topic for the Nutch mailing list, since Solr is just a
downstream consumer of whatever Nutch provides.
>
> -- Ken
>
>> On Sep 12, 2011 9:06 AM, "Erick Erickson [via Lucene]" <
>> ml-node+s472066n3328340...@n3.nabble.com> wrote:
>>>
>>>
>>> Nope, there's nothing in Solr that crawls anything, you have to feed
>>> documents in yourself from the websites.
>>>
>>> Or, look at the Nutch project, see: http://nutch.apache.org/about.html
>>>
>>> which is designed for this kind of problem.
>>>
>>> Best
>>> Erick
>>>
>>> On Sun, Sep 11, 2011 at 8:53 PM, dpt9876 
>> wrote:
 Hi all,
 I am wondering if Solr will do the following for a project I am working
>> on.
 I want to create a search engine with facets for potentially hundreds
of
 websites.
 Similar to say crawling amazon + buy.com + ebay and someone can search
>> these
 3 sites from my 1 website.
 (I realise there are better ways of doing the above example, its for
 illustrative purposes).
 Eventually I would build that search crawl to index say 200 or 1000
 merchants.
 Someone would come to my site and search for "digital camera".

 They would get results from all 3 indexes and hopefully dynamic facets
eg
 Price $100-200
 Price 200-300
 Resolution 1mp-2mp

 etc etc

 Can this be done on the fly?

 I ask this because I am currently developing webscrapers to crawl these
 websites, dump that data into a db, then was thinking of tacking on a
>> solr
 server to crawl my db.

 Problem with that approach is that crawling the worlds ecommerce sites
>> will
 take forever, when it seems solr might do that for me? (I have read
about
 multiple indexes etc).

 Many thanks

 --
 View this message in context:
>>
http://lucene.472066.n3.nabble.com/Will-Solr-Lucene-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3328314p3328314.html
 Sent from the Solr - User mailing list archive at Nabble.com.

>>>
>>>
>>> ___
>>> If you reply to this email, your message will be added to the discussion
>> below:
>>>
>>
http://lucene.472066.n3.nabble.com/Will-Solr-Lucene-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3328314p3328340.html
>>>
>>> To unsubscribe from Will Solr/Lucene crawl multi websites (aka a mini
>> google with faceted search)?, visit
>>
http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3328314&code=ZGFuaW50aGV0cm9waWNzQGdtYWlsLmNvbXwzMzI4MzE0fC04MDk0NTc1ODg=
>>
>>
>> --
>> View this message in context:
http://lucene.472066.n3.nabble.com/Will-Solr-Lucene-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3328314p3328449.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>
> --
> Ken Krugler
> +1 530-210-6378
> http://bixolabs.com
> custom big data solutions & training
> Hadoop, Cascading, Mahout & Solr
>
>
>
>
>
> ___
> If you reply to this email, your message will be added to the discussion
below:
>
http://lucene.472066.n3.nabble.com/Will-Solr-Lucene-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3328314p3328470.html
>
> To unsubscribe from Will Solr/Lucene crawl multi websites (aka a mini
google with faceted search)?, visit
http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3328314&code=ZGFuaW50aGV0cm9waWNzQGdtYWlsLmNvbXwzMzI4MzE0fC04MDk0NTc1ODg=


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Will-Solr-Lucene-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3328314p3328937.html
Sent from the Solr - User mailing list archive at Nabble.com.