date:20110911

Re: Stemming and other tokenizers

2011-09-11 Thread Patrick Sauts

I can't create one field per language, that is the problem but I'll dig into
it following your indications.
I let you know what I could come out with.

Patrick.

2011/9/11 Jan Høydahl 

> Hi,
>
> You'll not be able to detect language and change stemmer on the same field
> in one go. You need to create one fieldType in your schema per language you
> want to use, and then use LanguageIdentification (SOLR-1979) to do the magic
> of detecting language and renaming the field. If you set
> langid.override=false, languid.map=true and populate your "language" field
> with the known language, you will probably get the desired effect.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
>
> On 10. sep. 2011, at 03:24, Patrick Sauts wrote:
>
> > Hello,
> >
> >
> >
> > I want to implement some king of AutoStemming that will detect the
> language
> > of a field based on a tag at the start of this field like #en# my field
> is
> > stored on disc but I don't want this tag to be stored. Is there a way to
> > avoid this field to be stored ?
> >
> > To me all the filters and the tokenizers interact only with the indexed
> > field and not the stored one.
> >
> > Am I wrong ?
> >
> > Is it possible to you to do such a filter.
> >
> >
> >
> > Patrick.
> >
>
>

Re: Adding Query Filter custom implementation to Solr's pipeline

2011-09-11 Thread Chris Hostetter


: When I was using Lucene directly I used a custom implementation of query 
: filter to enforce entitlements of search results. Now, that I'm 
: switching my infrastructure from custom host to Solr, what is the best 
: way to configure Solr to use my custom query filter for every request?

It depends on how complex your custom Filter was.  

many people find that things that when using Solr, they can reimplement 
basic Filter logic using "fq" params and the built in QParsers provided by 
solr.  

If you do need to implement something truely custom, writing it as your 
own QParser to trigger via an "fq" can be advantageous so it can cached 
and re-used by many queries.

If that doesn't cut it for you, some people implement their own 
SearchComponents to manipulate the Queries.

And as a last resort: you can always implement your own RequestHandler and 
directly use so SolrIndexSearcher to execute the queyr anyway you want -- 
but if you don't use the DocList/DocSet methods, other built in features 
like faceting won't be very easy to use.

If you provide some more details on how your existing Filter work,s people 
cna provide more advice on what would make the most sense.

-Hoss

Re: Using multivalued field in map function

2011-09-11 Thread Chris Hostetter


: Hmmm, would it be simpler to do something like append
: a clause like this?
: BloggerId:12304^10 OR CoBloggerId:123404^5?

Definitely, but that won't garuntee you a strict ordering if there is a 
particularly good relevany match.

There's a bunch of ways to go about something like this, but trying to use 
the map function is definitely overkill (even if it could work on 
multivalued fields)

this kind of thing is particularly easy with the sort by function feature 
added in 3.2 -- because any query can be used as a function ...

q=your_query&sort=query(BloggerId:12304)+desc,+query(CoBloggerId:123404)+desc,+score+desc


-Hoss

Re: Solr and DateTimes - bug?

2011-09-11 Thread Chris Hostetter


: The XML output when performing a query via the solr interface is like this:
: 1-01-01T00:00:00Z

i think you mean: 1-01-01T00:00:00Z

: > > So my question is: Is this a bug in the solr output engine, or should mono
: > > be able to parse the date as given from solr? I have not yet tried it out
: > > on .net as I do not have access to a windows machine at the moment.

it is in fact a bug in Solr that not a lot of people have been overly 
concerned with some most people don't deal with dates that far back

https://issues.apache.org/jira/browse/SOLR-1899

...I spent a little time working on it at one point but got side tracked 
by other things since there are a coupld of related issues with the 
canonical iso8601 date format arround year "0" that made it non obvious 
what hte "ideal" solution was.

-Hoss

Parameter not working for master/slave

2011-09-11 Thread William Bell

I am using 3.3 SOLR. I tried passing in -Denable.master=true and
-Denable.slave=true on the Slave machine.
Then I changed solrconfig.xml to reference each as per:

http://wiki.apache.org/solr/SolrReplication#enable.2BAC8-disable_master.2BAC8-slave_in_a_node

But this is not working. The enable parameter does not appear to work in 3.3.

If this supposed to be working? What else can I do to debug it? How
can I see other parameters working in solrconfig.xml ?

-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076

Re: Will Solr/Lucene crawl multi websites (aka a mini google with faceted search)?

2011-09-11 Thread Ken Krugler


On Sep 11, 2011, at 7:04pm, dpt9876 wrote:

> Hi thanks for the reply.
> 
> How does nutch/solr handle the scenario where 1 website calls price, "price"
> and another website calls it "cost". Same thing different name, yet I would
> want the facet to handle that and not create a different facet.
> 
> Is this combo of nutch and Solr that intelligent and or intuitive?

What you're describing here is web mining, not web crawling.

You want to extract price data from web pages, and put that into a specific 
field in Solr.

To do that using Nutch, you'd need to write custom plug-ins that know how to 
extract the price from a page, and add that as a custom field to the crawl 
results.

The above is a topic for the Nutch mailing list, since Solr is just a 
downstream consumer of whatever Nutch provides.

-- Ken

> On Sep 12, 2011 9:06 AM, "Erick Erickson [via Lucene]" <
> ml-node+s472066n3328340...@n3.nabble.com> wrote:
>> 
>> 
>> Nope, there's nothing in Solr that crawls anything, you have to feed
>> documents in yourself from the websites.
>> 
>> Or, look at the Nutch project, see: http://nutch.apache.org/about.html
>> 
>> which is designed for this kind of problem.
>> 
>> Best
>> Erick
>> 
>> On Sun, Sep 11, 2011 at 8:53 PM, dpt9876 
> wrote:
>>> Hi all,
>>> I am wondering if Solr will do the following for a project I am working
> on.
>>> I want to create a search engine with facets for potentially hundreds of
>>> websites.
>>> Similar to say crawling amazon + buy.com + ebay and someone can search
> these
>>> 3 sites from my 1 website.
>>> (I realise there are better ways of doing the above example, its for
>>> illustrative purposes).
>>> Eventually I would build that search crawl to index say 200 or 1000
>>> merchants.
>>> Someone would come to my site and search for "digital camera".
>>> 
>>> They would get results from all 3 indexes and hopefully dynamic facets eg
>>> Price $100-200
>>> Price 200-300
>>> Resolution 1mp-2mp
>>> 
>>> etc etc
>>> 
>>> Can this be done on the fly?
>>> 
>>> I ask this because I am currently developing webscrapers to crawl these
>>> websites, dump that data into a db, then was thinking of tacking on a
> solr
>>> server to crawl my db.
>>> 
>>> Problem with that approach is that crawling the worlds ecommerce sites
> will
>>> take forever, when it seems solr might do that for me? (I have read about
>>> multiple indexes etc).
>>> 
>>> Many thanks
>>> 
>>> --
>>> View this message in context:
> http://lucene.472066.n3.nabble.com/Will-Solr-Lucene-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3328314p3328314.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>> 
>> 
>> 
>> ___
>> If you reply to this email, your message will be added to the discussion
> below:
>> 
> http://lucene.472066.n3.nabble.com/Will-Solr-Lucene-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3328314p3328340.html
>> 
>> To unsubscribe from Will Solr/Lucene crawl multi websites (aka a mini
> google with faceted search)?, visit
> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3328314&code=ZGFuaW50aGV0cm9waWNzQGdtYWlsLmNvbXwzMzI4MzE0fC04MDk0NTc1ODg=
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Will-Solr-Lucene-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3328314p3328449.html
> Sent from the Solr - User mailing list archive at Nabble.com.

--
Ken Krugler
+1 530-210-6378
http://bixolabs.com
custom big data solutions & training
Hadoop, Cascading, Mahout & Solr

Re: Will Solr/Lucene crawl multi websites (aka a mini google with faceted search)?

2011-09-11 Thread dpt9876

Hi thanks for the reply.

How does nutch/solr handle the scenario where 1 website calls price, "price"
and another website calls it "cost". Same thing different name, yet I would
want the facet to handle that and not create a different facet.

Is this combo of nutch and Solr that intelligent and or intuitive?

Thanks for the fast response.
On Sep 12, 2011 9:06 AM, "Erick Erickson [via Lucene]" <
ml-node+s472066n3328340...@n3.nabble.com> wrote:
>
>
> Nope, there's nothing in Solr that crawls anything, you have to feed
> documents in yourself from the websites.
>
> Or, look at the Nutch project, see: http://nutch.apache.org/about.html
>
> which is designed for this kind of problem.
>
> Best
> Erick
>
> On Sun, Sep 11, 2011 at 8:53 PM, dpt9876 
wrote:
>> Hi all,
>> I am wondering if Solr will do the following for a project I am working
on.
>> I want to create a search engine with facets for potentially hundreds of
>> websites.
>> Similar to say crawling amazon + buy.com + ebay and someone can search
these
>> 3 sites from my 1 website.
>> (I realise there are better ways of doing the above example, its for
>> illustrative purposes).
>> Eventually I would build that search crawl to index say 200 or 1000
>> merchants.
>> Someone would come to my site and search for "digital camera".
>>
>> They would get results from all 3 indexes and hopefully dynamic facets eg
>> Price $100-200
>> Price 200-300
>> Resolution 1mp-2mp
>>
>> etc etc
>>
>> Can this be done on the fly?
>>
>> I ask this because I am currently developing webscrapers to crawl these
>> websites, dump that data into a db, then was thinking of tacking on a
solr
>> server to crawl my db.
>>
>> Problem with that approach is that crawling the worlds ecommerce sites
will
>> take forever, when it seems solr might do that for me? (I have read about
>> multiple indexes etc).
>>
>> Many thanks
>>
>> --
>> View this message in context:
http://lucene.472066.n3.nabble.com/Will-Solr-Lucene-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3328314p3328314.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>
> ___
> If you reply to this email, your message will be added to the discussion
below:
>
http://lucene.472066.n3.nabble.com/Will-Solr-Lucene-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3328314p3328340.html
>
> To unsubscribe from Will Solr/Lucene crawl multi websites (aka a mini
google with faceted search)?, visit
http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3328314&code=ZGFuaW50aGV0cm9waWNzQGdtYWlsLmNvbXwzMzI4MzE0fC04MDk0NTc1ODg=


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Will-Solr-Lucene-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3328314p3328449.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Will Solr/Lucene crawl multi websites (aka a mini google with faceted search)?

2011-09-11 Thread Erick Erickson

Nope, there's nothing in Solr that crawls anything, you have to feed
documents in yourself from the websites.

Or, look at the Nutch project, see: http://nutch.apache.org/about.html

which is designed for this kind of problem.

Best
Erick

On Sun, Sep 11, 2011 at 8:53 PM, dpt9876  wrote:
> Hi all,
> I am wondering if Solr will do the following for a project I am working on.
> I want to create a search engine with facets for potentially hundreds of
> websites.
> Similar to say crawling amazon + buy.com + ebay and someone can search these
> 3 sites from my 1 website.
> (I realise there are better ways of doing the above example, its for
> illustrative purposes).
> Eventually I would build that search crawl to index say 200 or 1000
> merchants.
> Someone would come to my site and search for "digital camera".
>
> They would get results from all 3 indexes and hopefully dynamic facets eg
> Price $100-200
> Price 200-300
> Resolution 1mp-2mp
>
> etc etc
>
> Can this be done on the fly?
>
> I ask this because I am currently developing webscrapers to crawl these
> websites, dump that data into a db, then was thinking of tacking on a solr
> server to crawl my db.
>
> Problem with that approach is that crawling the worlds ecommerce sites will
> take forever, when it seems solr might do that for me? (I have read about
> multiple indexes etc).
>
> Many thanks
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Will-Solr-Lucene-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3328314p3328314.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Will Solr/Lucene crawl multi websites (aka a mini google with faceted search)?

2011-09-11 Thread dpt9876

Hi all,
I am wondering if Solr will do the following for a project I am working on.
I want to create a search engine with facets for potentially hundreds of
websites.
Similar to say crawling amazon + buy.com + ebay and someone can search these
3 sites from my 1 website.
(I realise there are better ways of doing the above example, its for
illustrative purposes).
Eventually I would build that search crawl to index say 200 or 1000
merchants.
Someone would come to my site and search for "digital camera".

They would get results from all 3 indexes and hopefully dynamic facets eg
Price $100-200
Price 200-300
Resolution 1mp-2mp

etc etc

Can this be done on the fly?

I ask this because I am currently developing webscrapers to crawl these
websites, dump that data into a db, then was thinking of tacking on a solr
server to crawl my db.

Problem with that approach is that crawling the worlds ecommerce sites will
take forever, when it seems solr might do that for me? (I have read about
multiple indexes etc).

Many thanks

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Will-Solr-Lucene-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3328314p3328314.html
Sent from the Solr - User mailing list archive at Nabble.com.

select query does not find indexed pdf document

2011-09-11 Thread Michael Dockery

I am new to solr.  

I tried to upload a pdf file via curl to my solr webapp (on tomcat)

curl 
"http://www/SearchApp/update/extract?stream.file=c:\dmvpn.pdf&stream.contentType=application/pdf&literal.id=pdf&commit=true";





0860



but

http://www/SearchApp/select/?q=vpn


does not find the document




0
0

vpn






help is appreciated.

=
fyi
I point my test webapp to the index/solr home via mod meta-data/context.xml

   

and I had to copy all these jars to my webapp lib dir: (to avoid the 
classnotfound)
Solr_download\contrib\extraction\lib
  ...in the future i plan to put them in the tomcat/lib dir.


Also, I have not modified conf\solrconfig.xml or schema.xml.

Re: Full-search index for the database

2011-09-11 Thread Erick Erickson

How much search-specific stuff are we talking here? Do you want to
do stemming? Plurals? Or are you talking exact match? Phrases?
multi-word queries? If exact match on individual terms
is all you want, you could hack something together like this:

index each term into a catch-all field with the field appended, something
like
val1|field1 val2|field2
be sure you don't use an analysis chain that splits on non-letters. Then, for
each term, append |* to the term and your returned terms will have the
field they came from. Of course you'll have to "do the right thing" with the
results to show them correctly, but that'd work.

But this is really abusing Solr . I wonder if this is an "XY problem", so
can you explain what it is you're trying to do at a higher level and maybe
we can suggest some other approach?

You could also have some kind of hybrid solution that searched with
Solr (not using the trick above) and just returned the PK from Solr,
then go to the DB to fill things out.

Best
Erick

On Sun, Sep 11, 2011 at 7:06 PM, Eugeny Balakhonov  wrote:
> My task is very simple:
>
> I have a big database with a lot tables and fields. This database has
> dynamic structure and can be extended or changed in any time.
> I need a tool for full-search possibility via all fields in all tables of my
> database. On the input of this tool - some text for search. On the output -
> some unique key and the name of field which contains this text.
>
>
> Solr is very good selection, but I have serious problem with it: all Solr
> query parsers (standard, dismax, edismax) requires explicit declaration of
> fields for search. But list of these fields in my case is very and very big!
> And at search time I don't know all field names in  the database.
>
> I think that my task is not unique. According google a lot of people tries
> to solve same problems with Solr.
>
> May be good idea to add more flexible possibilities for search in all
> indexed fields?
>
>
> I see following variants:
>
> 1. Add wildcards in the qf parameter for dismax/edismax query parsers.
>
> 2. Add possibility to store source field name in  operator in
> schema.xml. In this case user can do following:
>
> a) create field for default search:
>  multiValued="true"/>
> ...
> TEXT
>
> b) copy all fields to default search field:
> 
>
> c) In query response user can receive needed source field name:
>
> 
>  
>  
>  foo foo foo test foo foo
>  
>  
>
>
> 2011/9/12 Eugeny Balakhonov 
>
>> Hello,
>>
>> Thanks for answer!
>>
>> I have created separate fields in mysolr schema for each field in database
>> (more than 500!). How to ask parser for search via all these fields? By
>> default Solr schema should contain explicit declaration of default search
>> field like following:
>>
>> TEXT
>>
>> I tried to use following search query:
>>
>> .?q=*:search text&hl=on&defType=edismax
>>
>> In this case search goes across default search field.
>>
>> I can't concatenate all 500 database field names in a big search
>> expression.
>>
>>
>> 2011/9/11 Jamie Johnson 
>>
>>> You should create separate fields in your solr schema for each field
>>> in your database that you want recognized separately.  You can use a
>>> query parser like edismax to do a weighted query across all of your
>>> fields and then provide highlighting on the specific field which
>>> matched.
>>>
>>> 2011/9/10 Eugeny Balakhonov :
>>> > I want to create full-text search for my database.
>>> >
>>> > It means that search engine should look up some string for all fields of
>>> my
>>> > database.
>>> >
>>> > I have created Solr configuration for extracting and indexing data from
>>> a
>>> > database.
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > According documentation in the file schema.xml I have created field for
>>> > full-text search index:
>>> >
>>> >
>>> >
>>> > >> > multiValued="true"/>
>>> >
>>> >
>>> >
>>> > Also I have added strings for copying all values of all fields into this
>>> > full-search field:
>>> >
>>> >
>>> >
>>> > ...
>>> >
>>> >    
>>> >
>>> > ...
>>> >
>>> >
>>> >
>>> > In result I have possibility to search for all fields in my database.
>>> But I
>>> > can't recognize which field in the found record contains requested
>>> string.
>>> >
>>> > Highlighting functionality just marks string in the "TEXT" field like
>>> > following:
>>> >
>>> >
>>> >
>>> > 
>>> >
>>> > 
>>> >
>>> >  
>>> >
>>> >    Any text any text Test"
>>> >
>>> >  
>>> >
>>> > 
>>> >
>>> > 
>>> >
>>> >  
>>> >
>>> >   Any text any text Test"
>>> >
>>> >  
>>> >
>>> > 
>>> >
>>> >
>>> >
>>> > How to create full-search index with possibility to recognize source
>>> > database field?
>>> >
>>> >
>>> >
>>> > Thx a lot.
>>> >
>>> > Eugeny
>>> >
>>> >
>>>
>>
>>
>>
>> --
>> Best regards,
>> Eugeny Balakhonov
>>
>
>
>
> --
> Best regards,
> Eugeny Balakhonov
>

Re: Full-search index for the database

2011-09-11 Thread Eugeny Balakhonov

My task is very simple:

I have a big database with a lot tables and fields. This database has
dynamic structure and can be extended or changed in any time.
I need a tool for full-search possibility via all fields in all tables of my
database. On the input of this tool - some text for search. On the output -
some unique key and the name of field which contains this text.


Solr is very good selection, but I have serious problem with it: all Solr
query parsers (standard, dismax, edismax) requires explicit declaration of
fields for search. But list of these fields in my case is very and very big!
And at search time I don't know all field names in  the database.

I think that my task is not unique. According google a lot of people tries
to solve same problems with Solr.

May be good idea to add more flexible possibilities for search in all
indexed fields?


I see following variants:

1. Add wildcards in the qf parameter for dismax/edismax query parsers.

2. Add possibility to store source field name in  operator in
schema.xml. In this case user can do following:

a) create field for default search:

...
TEXT

b) copy all fields to default search field:


c) In query response user can receive needed source field name:


 
 
  foo foo foo test foo foo
  
  


2011/9/12 Eugeny Balakhonov 

> Hello,
>
> Thanks for answer!
>
> I have created separate fields in mysolr schema for each field in database
> (more than 500!). How to ask parser for search via all these fields? By
> default Solr schema should contain explicit declaration of default search
> field like following:
>
> TEXT
>
> I tried to use following search query:
>
> .?q=*:search text&hl=on&defType=edismax
>
> In this case search goes across default search field.
>
> I can't concatenate all 500 database field names in a big search
> expression.
>
>
> 2011/9/11 Jamie Johnson 
>
>> You should create separate fields in your solr schema for each field
>> in your database that you want recognized separately.  You can use a
>> query parser like edismax to do a weighted query across all of your
>> fields and then provide highlighting on the specific field which
>> matched.
>>
>> 2011/9/10 Eugeny Balakhonov :
>> > I want to create full-text search for my database.
>> >
>> > It means that search engine should look up some string for all fields of
>> my
>> > database.
>> >
>> > I have created Solr configuration for extracting and indexing data from
>> a
>> > database.
>> >
>> >
>> >
>> >
>> >
>> > According documentation in the file schema.xml I have created field for
>> > full-text search index:
>> >
>> >
>> >
>> > > > multiValued="true"/>
>> >
>> >
>> >
>> > Also I have added strings for copying all values of all fields into this
>> > full-search field:
>> >
>> >
>> >
>> > ...
>> >
>> >
>> >
>> > ...
>> >
>> >
>> >
>> > In result I have possibility to search for all fields in my database.
>> But I
>> > can't recognize which field in the found record contains requested
>> string.
>> >
>> > Highlighting functionality just marks string in the "TEXT" field like
>> > following:
>> >
>> >
>> >
>> > 
>> >
>> > 
>> >
>> >  
>> >
>> >Any text any text Test"
>> >
>> >  
>> >
>> > 
>> >
>> > 
>> >
>> >  
>> >
>> >   Any text any text Test"
>> >
>> >  
>> >
>> > 
>> >
>> >
>> >
>> > How to create full-search index with possibility to recognize source
>> > database field?
>> >
>> >
>> >
>> > Thx a lot.
>> >
>> > Eugeny
>> >
>> >
>>
>
>
>
> --
> Best regards,
> Eugeny Balakhonov
>



-- 
Best regards,
Eugeny Balakhonov

Re: Solr and DateTimes - bug?

2011-09-11 Thread Nicklas Overgaard


Hi,

The XML output when performing a query via the solr interface is like this:
1-01-01T00:00:00Z

It's solr 3.3.0 on an ArchLinux desktop machine with "OpenJDK 
6.b22_1.10.3-1" as my java runtime environment.


/Nicklas

On 2011-09-12 00:26, Jan Høydahl wrote:

Hi,

Can you try to make a plain HTTP query from the admin GUI on your index and 
tell us what the XML response is for that date field?
http://localhost:8983/solr/select?q=*:*
If that date output is wrong as well, there may be a bug with Solr. If it is 
correct, you have a problem in SolrNet.

Btw, which version of Solr do you use?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 12. sep. 2011, at 00:28, Nicklas Overgaard wrote:


Hi everybody,

I just started playing around with solr, however i'm facing some trouble. The 
test data i'm indexing with solr is, amongst other things, containing date and 
times.

By the way, I'm using mono and i'm talking to solr through the SolrNet library.

The issue i'm facing:

Some of the dates corresponds to the DateTime.MinValue of .net, which is "0001-01-01 
00:00:00". When this date is returned from Solr, it's returned like 
"1-01-01T00:00:00Z". Now, I figured out that solr supposedly should return dates 
according to the ISO 8601 standard - but the above output is not in that format.

This basically leads to mono breaking down because it's not able to parse the above date. 
If i add three leading zeroes, it parses just fine (so it becomes 
"0001-01-01T00:00:00Z", the correct ISO 8601 format).

So my question is: Is this a bug in the solr output engine, or should mono be 
able to parse the date as given from solr? I have not yet tried it out on .net 
as I do not have access to a windows machine at the moment.

Best regards,

Nicklas

Re: Nested documents

2011-09-11 Thread Michael McCandless

Even if it applies, this is for Lucene.  I don't think we've added
Solr support for this yet... we should!

Mike McCandless

http://blog.mikemccandless.com

On Sun, Sep 11, 2011 at 12:16 PM, Erick Erickson
 wrote:
> Does this JIRA apply?
>
> https://issues.apache.org/jira/browse/LUCENE-3171
>
> Best
> Erick
>
> On Sat, Sep 10, 2011 at 8:32 PM, Andy  wrote:
>> Hi,
>>
>> Does Solr support nested documents? If not is there any plan to add such a 
>> feature?
>>
>> Thanks.
>

Re: Solr and DateTimes - bug?

2011-09-11 Thread Jan Høydahl

Hi,

Can you try to make a plain HTTP query from the admin GUI on your index and 
tell us what the XML response is for that date field?
http://localhost:8983/solr/select?q=*:*
If that date output is wrong as well, there may be a bug with Solr. If it is 
correct, you have a problem in SolrNet.

Btw, which version of Solr do you use?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 12. sep. 2011, at 00:28, Nicklas Overgaard wrote:

> Hi everybody,
> 
> I just started playing around with solr, however i'm facing some trouble. The 
> test data i'm indexing with solr is, amongst other things, containing date 
> and times.
> 
> By the way, I'm using mono and i'm talking to solr through the SolrNet 
> library.
> 
> The issue i'm facing:
> 
> Some of the dates corresponds to the DateTime.MinValue of .net, which is 
> "0001-01-01 00:00:00". When this date is returned from Solr, it's returned 
> like "1-01-01T00:00:00Z". Now, I figured out that solr supposedly should 
> return dates according to the ISO 8601 standard - but the above output is not 
> in that format.
> 
> This basically leads to mono breaking down because it's not able to parse the 
> above date. If i add three leading zeroes, it parses just fine (so it becomes 
> "0001-01-01T00:00:00Z", the correct ISO 8601 format).
> 
> So my question is: Is this a bug in the solr output engine, or should mono be 
> able to parse the date as given from solr? I have not yet tried it out on 
> .net as I do not have access to a windows machine at the moment.
> 
> Best regards,
> 
> Nicklas

Re: Full-search index for the database

2011-09-11 Thread Eugeny Balakhonov

Hello,

Thanks for answer!

I have created separate fields in mysolr schema for each field in database
(more than 500!). How to ask parser for search via all these fields? By
default Solr schema should contain explicit declaration of default search
field like following:

TEXT

I tried to use following search query:

.?q=*:search text&hl=on&defType=edismax

In this case search goes across default search field.

I can't concatenate all 500 database field names in a big search expression.


2011/9/11 Jamie Johnson 

> You should create separate fields in your solr schema for each field
> in your database that you want recognized separately.  You can use a
> query parser like edismax to do a weighted query across all of your
> fields and then provide highlighting on the specific field which
> matched.
>
> 2011/9/10 Eugeny Balakhonov :
> > I want to create full-text search for my database.
> >
> > It means that search engine should look up some string for all fields of
> my
> > database.
> >
> > I have created Solr configuration for extracting and indexing data from a
> > database.
> >
> >
> >
> >
> >
> > According documentation in the file schema.xml I have created field for
> > full-text search index:
> >
> >
> >
> >  > multiValued="true"/>
> >
> >
> >
> > Also I have added strings for copying all values of all fields into this
> > full-search field:
> >
> >
> >
> > ...
> >
> >
> >
> > ...
> >
> >
> >
> > In result I have possibility to search for all fields in my database. But
> I
> > can't recognize which field in the found record contains requested
> string.
> >
> > Highlighting functionality just marks string in the "TEXT" field like
> > following:
> >
> >
> >
> > 
> >
> > 
> >
> >  
> >
> >Any text any text Test"
> >
> >  
> >
> > 
> >
> > 
> >
> >  
> >
> >   Any text any text Test"
> >
> >  
> >
> > 
> >
> >
> >
> > How to create full-search index with possibility to recognize source
> > database field?
> >
> >
> >
> > Thx a lot.
> >
> > Eugeny
> >
> >
>



-- 
Best regards,
Eugeny Balakhonov

Re: Running solr on small amounts of RAM

2011-09-11 Thread Jan Høydahl

Hi,

Beware that Solr4.0 branch has multiple RAM conserving optimizations which may 
cause your index to take considerably less space, so try it out.
Also, of course, prune your schema to turn off everything you don't need, and 
also your OS to stop services you don't use.
Consider disallowing certain type of queries from the clients (such as 
wildcard, sorting, fuzzy etc) to avoid getting int high-mem situations.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 11. sep. 2011, at 17:59, Erick Erickson wrote:

> Well, this answer isn't much more satisfactory than "get more memory",
> but about all I can say is "try it and see".
> 
> Sure, make your caches very small and monitor memory and test it out.
> 
> You'll get a sense of how fast (or slow) the queries are pretty quickly. Or
> you can get a ballpark estimate of what running without caches would
> do performance wise by simply measuring the first query after a restart.
> 
> Because, unfortunately, "it depends" is the only accurate answer. It
> depends on how much sorting, faceting etc. you do as well as the
> queries themselves.
> 
> Best
> Erick
> 
> On Fri, Sep 9, 2011 at 12:48 PM, Mike Austin  wrote:
>> I'm trying to push to get solr used in our environment. I know I could have
>> responses saying WHY can't you get more RAM etc.., but lets just skip those
>> and work with this situation.
>> 
>> Our index is very small with 100k documents and a light load at the moment.
>> If I wanted to use the smallest possible RAM on the server, how would I do
>> this and what are the issues?
>> 
>> I know that caching would be the biggest lose but if solr ran with no to
>> little caching, the performance would still be ok? I know this is a relative
>> question..
>> This is the only application using java on this machine, would tuning java
>> to use less cache help anything?
>> I should set the cache settings low in the config?
>> Basically, what will having a very low cache hit rate do to search speed and
>> server performance?  I know more is better and it depends on what I'm
>> comparing it to but if you could just answer in some way saying that it's
>> not going to cripple the machine or cause 5 second searches?
>> 
>> It's on a windows server.
>> 
>> 
>> Thanks,
>> Mike
>>

Re: Stemming and other tokenizers

2011-09-11 Thread Jan Høydahl

Hi,

You'll not be able to detect language and change stemmer on the same field in 
one go. You need to create one fieldType in your schema per language you want 
to use, and then use LanguageIdentification (SOLR-1979) to do the magic of 
detecting language and renaming the field. If you set langid.override=false, 
languid.map=true and populate your "language" field with the known language, 
you will probably get the desired effect.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 10. sep. 2011, at 03:24, Patrick Sauts wrote:

> Hello,
> 
> 
> 
> I want to implement some king of AutoStemming that will detect the language
> of a field based on a tag at the start of this field like #en# my field is
> stored on disc but I don't want this tag to be stored. Is there a way to
> avoid this field to be stored ?
> 
> To me all the filters and the tokenizers interact only with the indexed
> field and not the stored one.
> 
> Am I wrong ?
> 
> Is it possible to you to do such a filter.
> 
> 
> 
> Patrick.
>

Re: Example Solr Config on EC2

2011-09-11 Thread Pulkit Singhal

Just to clarify, that link doesn't do anything to promote an already running
slave into a master. One would have to bounce the Solr node which has that
slave and then make the shift.  It is not something that happens at runtime
live.

On Wed, Aug 10, 2011 at 4:04 PM, Akshay  wrote:

> Yes you can promote a slave to be master refer
>
> http://wiki.apache.org/solr/SolrReplication#enable.2BAC8-disable_master.2BAC8-slave_in_a_node
>
> In AWS one can use an elastic IP(http://aws.amazon.com/articles/1346) to
> refer to the master and this can be assigned to slaves as they assume the
> role of master(in case of failure). All slaves will then refer to this new
> master and there will be no need to regenerate data.
>
> Automation of this maybe possible through CloudWatch alarm-actions. I don't
> know of any available example automation scripts.
>
> Cheers
> Akshay.
>
> On Wed, Aug 10, 2011 at 9:08 PM, Matt Shields 
> wrote:
>
> > If I were to build a master with multiple slaves, is it possible to
> promote
> > a slave to be the new master if the original master fails?  Will all the
> > slaves pickup right where they left off, or any time the master fails
> will
> > we need to completely regenerate all the data?
> >
> > If this is possible, are there any examples of this being automated?
> >  Especially on Win2k3.
> >
> > Matthew Shields
> > Owner
> > BeanTown Host - Web Hosting, Domain Names, Dedicated Servers, Colocation,
> > Managed Services
> > www.beantownhost.com
> > www.sysadminvalley.com
> > www.jeeprally.com
> >
> >
> >
> > On Mon, Aug 8, 2011 at 5:34 PM,  wrote:
> >
> > > Matthew,
> > >
> > > Here's another resource:
> > >
> > >
> >
> http://www.lucidimagination.com/blog/2010/02/01/solr-shines-through-the-cloud-lucidworks-solr-on-ec2/
> > >
> > >
> > > Michael Bohlig
> > > Lucid Imagination
> > >
> > >
> > >
> > > - Original Message 
> > > From: Matt Shields 
> > > To: solr-user@lucene.apache.org
> > > Sent: Mon, August 8, 2011 2:03:20 PM
> > > Subject: Example Solr Config on EC2
> > >
> > > I'm looking for some examples of how to setup Solr on EC2.  The
> > > configuration I'm looking for would have multiple nodes for redundancy.
> > > I've tested in-house with a single master and slave with replication
> > > running in Tomcat on Windows Server 2003, but even if I have multiple
> > > slaves
> > > the single master is a single point of failure.  Any suggestions or
> > example
> > > configurations?  The project I'm working on is a .NET setup, so ideally
> > I'd
> > > like to keep this search cluster on Windows Server, even though I
> prefer
> > > Linux.
> > >
> > > Matthew Shields
> > > Owner
> > > BeanTown Host - Web Hosting, Domain Names, Dedicated Servers,
> Colocation,
> > > Managed Services
> > > www.beantownhost.com
> > > www.sysadminvalley.com
> > > www.jeeprally.com
> > >
> > >
> >
>

Solr and DateTimes - bug?

2011-09-11 Thread Nicklas Overgaard


Hi everybody,

I just started playing around with solr, however i'm facing some 
trouble. The test data i'm indexing with solr is, amongst other things, 
containing date and times.


By the way, I'm using mono and i'm talking to solr through the SolrNet 
library.


The issue i'm facing:

Some of the dates corresponds to the DateTime.MinValue of .net, which is 
"0001-01-01 00:00:00". When this date is returned from Solr, it's 
returned like "1-01-01T00:00:00Z". Now, I figured out that solr 
supposedly should return dates according to the ISO 8601 standard - but 
the above output is not in that format.


This basically leads to mono breaking down because it's not able to 
parse the above date. If i add three leading zeroes, it parses just fine 
(so it becomes "0001-01-01T00:00:00Z", the correct ISO 8601 format).


So my question is: Is this a bug in the solr output engine, or should 
mono be able to parse the date as given from solr? I have not yet tried 
it out on .net as I do not have access to a windows machine at the moment.


Best regards,

Nicklas

Re: SolrCloud Feedback

2011-09-11 Thread Mark Miller


On Sep 9, 2011, at 1:09 PM, Pulkit Singhal wrote:

> I think I understand it a bit better now but wouldn't mind some validation.
> 
> 1) solr.xml does not become part of ZooKeeper

Right - currently it does not. Info is put there to tell Solr how to connect to 
zookeeper and register the cores.

> 2) The default looks like this out-of-box:
>  
>
>  
> so that may leave one wondering where the core's association to a
> collection name is made?
> 
> It can be made like so:
> a) statically in a file:
> 
> b) at start time via java:
> java ... -Dcollection.configName=myconf ... -jar start.jar

These are two different things. First, just to make the bootstrap case simple, 
if you don't specify a collection name, it defaults to the SolrCore name. That 
is why we make a default SolrCore name of collection1. In the simple wiki 
SolrCloud example, you can avoid naming the collection on each shard and simply 
have things come up under collection1 by default.

a) shows how to override using the SolrCore name for the collection name.

b) shows how to set the configuration set name for the config files that you 
upload with -Dbootstrap_confdir=. If you specify nothing for 
collection.configName, it defaults to configuration1.

> 
> And I'm guessing that since the core's name ("collection1") for shard1
> has already been associated with -Dcollection.configname=myconf in
> http://wiki.apache.org/solr/SolrCloud#Example_A:_Simple_two_shard_cluster
> once already, adding an additional shard2 with the same core name
> ("collection1"), automatically throws it in with the collection name
> ("myconf") without any need to specify anything at startup via -D or
> statically in solr.xml file.

"myconf" is not the collection name - it's the name of a collection of 
configuration files. If only one such set exists, you don't have to specify 
which to use (which you would do by changing the value at a given node in the 
zookeeper layout). If you wanted multiple named collection file sets, you would 
have to explicitly set each collection -> name configuration file set.

> 
> Validate away otherwise I'll just accept any hate mail after making
> edits to the Solr wiki directly.
> 
> - Pulkit
> 
> On Fri, Sep 9, 2011 at 11:38 AM, Pulkit Singhal  
> wrote:
>> Hello Jan,
>> 
>> You've made a very good point in (b). I would be happy to make the
>> edit to the wiki if I understood your explanation completely.
>> 
>> When you say that it is "looking up what collection that core is part
>> of" ... I'm curious how a core is being put under a particular
>> collection in the first place? And what that collection is named?
>> Obviously you've made it clear that colelction1 is really the name of
>> the core itself. And where this association is being stored for the
>> code to look it up?
>> 
>> If not Jan, then perhaps the gurus who wrote Solr Cloud could answer :)
>> 
>> Thanks!
>> - Pulkit
>> 
>> On Thu, Feb 10, 2011 at 9:10 AM, Jan Høydahl  wrote:
>>> Hi,
>>> 
>>> I have so far just tested the examples and got a N by M cluster running. My 
>>> feedback:
>>> 
>>> a) First of all, a major update of the SolrCloud Wiki is needed, to clearly 
>>> state what is in which version, what are current improvement plans and get 
>>> rid of outdated stuff. That said I think there are many good ideas there.
>>> 
>>> b) The "collection" terminology is too much confused with "core", and 
>>> should probably be made more distinct. I just tried to configure two cores 
>>> on the same Solr instance into the same collection, and that worked fine, 
>>> both as distinct shards and as same shard (replica). The wiki examples give 
>>> the impression that "collection1" in 
>>> localhost:8983/solr/collection1/select?distrib=true is some magic 
>>> collection identifier, but what it really does is doing the query on the 
>>> *core* named "collection1", looking up what collection that core is part of 
>>> and distributing the query to all shards in that collection.
>>> 
>>> c) ZK is not designed to store large files. While the files in conf are 
>>> normally well below the 1M limit ZK imposes, we should perhaps consider 
>>> using a lightweight distributed object or k/v store for holding the 
>>> /CONFIGS and let ZK store a reference only
>>> 
>>> d) How are admins supposed to update configs in ZK? Install their favourite 
>>> ZK editor?
>>> 
>>> e) We should perhaps not be so afraid to make ZK a requirement for Solr in 
>>> v4. Ideally you should interact with a 1-node Solr in the same manner as 
>>> you do with a 100-node Solr. An example is the Admin GUI where the "schema" 
>>> and "solrconfig" links assume local file. This requires decent tool support 
>>> to make ZK interaction intuitive, such as "import" and "export" commands.
>>> 
>>> --
>>> Jan Høydahl, search solution architect
>>> Cominvent AS - www.cominvent.com
>>> 
>>> On 19. jan. 2011, at 21.07, Mark Miller wrote:
>>> 
 Hello Users,
 
 About a little over a year ago, a few of us started workin

Re: solr equivalent of "select distinct"

2011-09-11 Thread Michael Sokolov

You can get what you want - unique lists of values from docs matching 
your query - for a single field (using facets), but not for the 
co-occurrence of two field values.  So you could combine the two fields 
together, if you know what they are going to be "in advance."  Facets 
also give you counts, so in some special cases, you could get what you 
want - eg you can tell when there is only a single pair of values since 
their counts will be the same and the same as the total.  But that's all 
I can think of.


-Mike

On 9/11/2011 12:39 PM, Mark juszczec wrote:

Here's an example:

PK   FLD1  FLD2FLD3 FLD4 FLD5
AB0  AB  0 x   y
AB1  AB  1 x   y
CD0  CD  0 a   b
CD1  CD  1 e   f

I want to write a query using only the terms FLD1 and FLD2 and ONLY get
back:

A B x y
C D a b
C D e f

Since FLD4 and FLD5 are the same for PK=AB0 and AB1, I only want one
occurrence of those records.

Since FLD4 and FLD5 are different for PK=CD0 and CD1, I want BOTH
occurrences of those records.

Re: searching for terms containing embedded spaces

2011-09-11 Thread Erick Erickson

OK, there are several issues here:
q= *:*  AND CUSTOMER_TYPE_NM:Network Advertiser AND
ACTIVE_IND:1&defType=edismax&rows=500&sort=ACCOUNT_CUSTOMER_ID
asc&start=0

the *:* is doing you no good, I'd just remove it.

defType=edismax probably isn't doing what you expect, you're not
specifying any fields
(no qf parameter).

This is going to your request handler that has ' default="true" '
defined. If you're using a
stock example, you're probably searching against the default search
field defined in
schema.xml, probably a field named "text".

If you have a request handler named "edismax", you can use the qt=edismax
parameter. If your request handler is named "/edismax", then use either
qt=/edismax or solr/edismax?q=

Attach the &debugQuery=on" and look at the parsed form of the
query.

But edismax plays nicer than dismax used to, it's probably searching
against your default
search field. Which is probably NOT CUSTOMER_TYPE_NM.

String types are completely unanalyzed, so they're case sensitive. If
you want a case-insensitive
version, use something like KeywordTokenizer followed by
LowerCaseFilter. The admin/analysis
page will help you a lot here.

I think you'll get a lot of insight into this if you attach
&debugQuery=on and look at the
 and  sections (after the results list).

Best
Erick

On Sun, Sep 11, 2011 at 2:25 PM, Mark juszczec  wrote:
> The field's properties are:
>
> field name="CUSTOMER_TYPE_NM" type="string" indexed="true" stored="true"
> required="true" default="CUSTOMER_TYPE_NM_MISSING"
>
> There have been no changes since I last completely rebuilt the index.
>
> Is re-indexing done when an index is completely rebuilt with a a
> dataimport=full?   How about if we've done dataimport=delta?
>
> If it helps, this is what I get when I print out the ModifiableSolrParams
> object I'm sending to the query method:
>
> q=+*%3A*++AND+CUSTOMER_TYPE_NM%3ANetwork+Advertiser+AND+ACTIVE_IND%3A1&defType=edismax&rows=500&sort=ACCOUNT_CUSTOMER_ID+asc&start=0
>
> Mark
>
> On Sun, Sep 11, 2011 at 2:05 PM, Yonik Seeley 
> wrote:
>
>> On Sun, Sep 11, 2011 at 1:39 PM, Mark juszczec 
>> wrote:
>> > That's what I thought.  The problem is, its not and I am unsure what is
>> > wrong.
>>
>> What is the fieldType definition for that field?  Did you change it
>> without re-indexing?
>>
>> -Yonik
>> http://www.lucene-eurocon.com - The Lucene/Solr User Conference
>>
>

Re: solr equivalent of "select distinct"

2011-09-11 Thread Erick Erickson

Hmmm, there's no good way I can think of off the top of my
head to do this. Whenever people find themselves thinking
in terms of RDBMSs, I have to ask whether the problem is
really appropriate for a search engine. And/or what the problem
you're trying to solve with this approach is from a higher level.
Perhaps there's another approach completely that would
serve

Best
Erick

On Sun, Sep 11, 2011 at 12:39 PM, Mark juszczec  wrote:
> Erick
>
> Thanks very much for the reply.
>
> I typed this late Friday after work and tried to simplify the problem
> description.  I got something wrong.  Hopefully this restatement is better:
>
> My PK is FLD1, FLD2 and FLD3 concatenated together.
>
> In some cases FLD1 and FLD2 can be the same.  The ONLY differing field being
> FLD3.
>
> Here's an example:
>
> PK   FLD1      FLD2    FLD3 FLD4 FLD5
> AB0  A            B          0     x       y
> AB1  A            B          1     x       y
> CD0  C            D          0     a       b
> CD1  C            D          1     e       f
>
> I want to write a query using only the terms FLD1 and FLD2 and ONLY get
> back:
>
> A B x y
> C D a b
> C D e f
>
> Since FLD4 and FLD5 are the same for PK=AB0 and AB1, I only want one
> occurrence of those records.
>
> Since FLD4 and FLD5 are different for PK=CD0 and CD1, I want BOTH
> occurrences of those records.
>
> I'm hoping I can use wildcards to get FLD4 and FLD5.  If not, I can use fl=
>
> I'm using edismax.
>
> We are also creating the query string on the fly.  I suspect using SolrJ and
> plugging the values into a bean would be easier - or do I have that wrong?
>
> I hope the tables of example data display properly.
>
> Mark
>
> On Sun, Sep 11, 2011 at 12:06 PM, Erick Erickson 
> wrote:
>
>> This smells like an XY problem, can you back up and give a higher-level
>> reason *why* you want this behavior?
>>
>> Because given your problem description, this seems like you are getting
>> correct behavior no matter how you define the problem. You're essentially
>> saying that you have two records with identical beginnings of your PK,
>> why is it incorrect to give you both records?
>>
>> But, anyway, if you're searching on FLD1 and FLD2, then by definition
>> you're going to get both records back or the search would be failing!
>>
>> Best
>> Erick
>>
>> On Fri, Sep 9, 2011 at 8:08 PM, Mark juszczec 
>> wrote:
>> > Hello everyone
>> >
>> > Let's say each record in my index contains fields named PK, FLD1, FLD2,
>> FLD3
>> >  FLD100
>> >
>> > PK is my solr primary key and I'm creating it by concatenating
>> > FLD1+FLD2+FLD3 and I'm guaranteed that combination will be unique
>> >
>> > Let's say 2 of these records have FLD1 = A and FLD2 = B.  I am unsure
>> about
>> > the remaining fields
>> >
>> > Right now, if I do a query specifying FLD1 = A and FLD2 = B then I get
>> both
>> > records.  I only want 1.
>> >
>> > Research says I should use faceting.  But this:
>> >
>> > q=FLD1:A and FLD2:B & rows=500 & defType=edismax & fl=FLD1, FLD2 &
>> > facet=true & facet_field=FLD1 & facet_field=FLD2
>> >
>> > gives me 2 records.
>> >
>> > In fact, it gives me the same results as:
>> >
>> > q=FLD1:A and FLD2:B & rows=500 & defType=edismax & fl=FLD1, FLD2
>> >
>> > I'm wrong somewhere, but I'm unsure where.
>> >
>> > Is faceting the right way to go or should I be using grouping?
>> >
>> > Curiously, when I use grouping like this:
>> >
>> > q=FLD1:A and FLD2:B &rows=500 &defType=edismax &indent=true &fl=FLD1,
>> FLD2
>> > &group=true &group.field=FLD1 &group.field=FLD2
>> >
>> > I get 2 records as well.
>> >
>> > Has anyone dealt with mimicing "select distinct" in Solr?
>> >
>> > Any advice would be very appreciated.
>> >
>> > Mark
>> >
>>
>

Re: searching for terms containing embedded spaces

2011-09-11 Thread Mark juszczec

The field's properties are:

field name="CUSTOMER_TYPE_NM" type="string" indexed="true" stored="true"
required="true" default="CUSTOMER_TYPE_NM_MISSING"

There have been no changes since I last completely rebuilt the index.

Is re-indexing done when an index is completely rebuilt with a a
dataimport=full?   How about if we've done dataimport=delta?

If it helps, this is what I get when I print out the ModifiableSolrParams
object I'm sending to the query method:

q=+*%3A*++AND+CUSTOMER_TYPE_NM%3ANetwork+Advertiser+AND+ACTIVE_IND%3A1&defType=edismax&rows=500&sort=ACCOUNT_CUSTOMER_ID+asc&start=0

Mark

On Sun, Sep 11, 2011 at 2:05 PM, Yonik Seeley wrote:

> On Sun, Sep 11, 2011 at 1:39 PM, Mark juszczec 
> wrote:
> > That's what I thought.  The problem is, its not and I am unsure what is
> > wrong.
>
> What is the fieldType definition for that field?  Did you change it
> without re-indexing?
>
> -Yonik
> http://www.lucene-eurocon.com - The Lucene/Solr User Conference
>

Re: searching for terms containing embedded spaces

2011-09-11 Thread Yonik Seeley

On Sun, Sep 11, 2011 at 1:39 PM, Mark juszczec  wrote:
> That's what I thought.  The problem is, its not and I am unsure what is
> wrong.

What is the fieldType definition for that field?  Did you change it
without re-indexing?

-Yonik
http://www.lucene-eurocon.com - The Lucene/Solr User Conference

Re: searching for terms containing embedded spaces

2011-09-11 Thread Mark juszczec

That's what I thought.  The problem is, its not and I am unsure what is
wrong.



On Sun, Sep 11, 2011 at 1:35 PM, Yonik Seeley wrote:

> On Sun, Sep 11, 2011 at 1:15 PM, Mark juszczec 
> wrote:
> > I am looking for a text string with a single, embedded space.  For the
> > purposes of this example, it is "a b" and its stored in the index in a
> field
> > called field.
> >
> > Am I incorrect in assuming the query field:"a b" will match the the
> string a
> > followed by a single embedded space followed by a b?
>
> Yes, that should work regardless of how the field is indexed (as a big
> single token, or as a normal text field that doesn't preserve spaces).
>
> -Yonik
> http://www.lucene-eurocon.com - The Lucene/Solr User Conference
>

Re: searching for terms containing embedded spaces

2011-09-11 Thread Yonik Seeley

On Sun, Sep 11, 2011 at 1:15 PM, Mark juszczec  wrote:
> I am looking for a text string with a single, embedded space.  For the
> purposes of this example, it is "a b" and its stored in the index in a field
> called field.
>
> Am I incorrect in assuming the query field:"a b" will match the the string a
> followed by a single embedded space followed by a b?

Yes, that should work regardless of how the field is indexed (as a big
single token, or as a normal text field that doesn't preserve spaces).

-Yonik
http://www.lucene-eurocon.com - The Lucene/Solr User Conference

Re: searching for terms containing embedded spaces

2011-09-11 Thread Mark juszczec

>
> But as Erick says, it's not clear that's really what you want (to
> search on a single term with a space in it).  If it's a normal text
> field, each word will be indexed separately, so you really want a
> phrase query or a boolean query:
>
> field:"a b"
> or
> field:(a b)
>
>
I am looking for a text string with a single, embedded space.  For the
purposes of this example, it is "a b" and its stored in the index in a field
called field.

Am I incorrect in assuming the query field:"a b" will match the the string a
followed by a single embedded space followed by a b?

I'm also wondering if this is already handled by the Solr/SolrJ API and if
we are making our lives more difficult by assembling the query strings
ourselves.

Mark


> -Yonik
> http://www.lucene-eurocon.com - The Lucene/Solr User Conference
>

Re: searching for terms containing embedded spaces

2011-09-11 Thread Yonik Seeley

On Sun, Sep 11, 2011 at 12:56 PM, Mark juszczec  wrote:
> We've also tried making it create
>
> field:a\ b
>
> The first case just does not work and I'm unsure why.
>
> The second case ends up url encoding the \ and I'm unsure if that will cause
> it to be used in the query or not.

URL encoding is just part of the transfer syntax for an HTTP GET/POST
- by the time the query makes it to the lucene/solr query parser, that
escaping will have been removed.

You can also use
http://lucene.apache.org/solr/api/org/apache/solr/search/TermQParserPlugin.html
and not worry about any escaping.

But as Erick says, it's not clear that's really what you want (to
search on a single term with a space in it).  If it's a normal text
field, each word will be indexed separately, so you really want a
phrase query or a boolean query:

field:"a b"
or
field:(a b)

-Yonik
http://www.lucene-eurocon.com - The Lucene/Solr User Conference

Re: searching for terms containing embedded spaces

2011-09-11 Thread Mark juszczec

Erick

My field contains "a b" (without ")

We are trying to assemble the query as a String by appending the various
values.  I think that is a large part of the problem and our lives would be
easier if we let the Solr api do this work.

We've experimented with our "query assembler" producing

field:a+b

We've also tried making it create

field:a\ b

The first case just does not work and I'm unsure why.

The second case ends up url encoding the \ and I'm unsure if that will cause
it to be used in the query or not.

Mark

On Sun, Sep 11, 2011 at 12:10 PM, Erick Erickson wrote:

> Try escaping it for a start.
>
> But why do you want to? If it's a phrase query, enclose it in double
> quotes.
> You really have to provide more details, because there are too many
> possibilities
> to answer. For instance:
>
> If you're entering field:a b then 'b' will be searched against your
> default text field
> and you should enter field:(a b) or field:a field:b
>
> If you've tokenized the field, you shouldn't care.
>
> If you're using keywordanalyzer, escaping should work.
>
> Etc.
> 
>
> Best
> Erick
>
> On Fri, Sep 9, 2011 at 8:11 PM, Mark juszczec 
> wrote:
> > Hi folks
> >
> > I've got a field that contains 2 words separated by a single blank.
> >
> > What's the trick to creating a search string that contains the single
> blank?
> >
> > Mark
> >
>

Re: solr equivalent of "select distinct"

2011-09-11 Thread Mark juszczec

Erick

Thanks very much for the reply.

I typed this late Friday after work and tried to simplify the problem
description.  I got something wrong.  Hopefully this restatement is better:

My PK is FLD1, FLD2 and FLD3 concatenated together.

In some cases FLD1 and FLD2 can be the same.  The ONLY differing field being
FLD3.

Here's an example:

PK   FLD1  FLD2FLD3 FLD4 FLD5
AB0  AB  0 x   y
AB1  AB  1 x   y
CD0  CD  0 a   b
CD1  CD  1 e   f

I want to write a query using only the terms FLD1 and FLD2 and ONLY get
back:

A B x y
C D a b
C D e f

Since FLD4 and FLD5 are the same for PK=AB0 and AB1, I only want one
occurrence of those records.

Since FLD4 and FLD5 are different for PK=CD0 and CD1, I want BOTH
occurrences of those records.

I'm hoping I can use wildcards to get FLD4 and FLD5.  If not, I can use fl=

I'm using edismax.

We are also creating the query string on the fly.  I suspect using SolrJ and
plugging the values into a bean would be easier - or do I have that wrong?

I hope the tables of example data display properly.

Mark

On Sun, Sep 11, 2011 at 12:06 PM, Erick Erickson wrote:

> This smells like an XY problem, can you back up and give a higher-level
> reason *why* you want this behavior?
>
> Because given your problem description, this seems like you are getting
> correct behavior no matter how you define the problem. You're essentially
> saying that you have two records with identical beginnings of your PK,
> why is it incorrect to give you both records?
>
> But, anyway, if you're searching on FLD1 and FLD2, then by definition
> you're going to get both records back or the search would be failing!
>
> Best
> Erick
>
> On Fri, Sep 9, 2011 at 8:08 PM, Mark juszczec 
> wrote:
> > Hello everyone
> >
> > Let's say each record in my index contains fields named PK, FLD1, FLD2,
> FLD3
> >  FLD100
> >
> > PK is my solr primary key and I'm creating it by concatenating
> > FLD1+FLD2+FLD3 and I'm guaranteed that combination will be unique
> >
> > Let's say 2 of these records have FLD1 = A and FLD2 = B.  I am unsure
> about
> > the remaining fields
> >
> > Right now, if I do a query specifying FLD1 = A and FLD2 = B then I get
> both
> > records.  I only want 1.
> >
> > Research says I should use faceting.  But this:
> >
> > q=FLD1:A and FLD2:B & rows=500 & defType=edismax & fl=FLD1, FLD2 &
> > facet=true & facet_field=FLD1 & facet_field=FLD2
> >
> > gives me 2 records.
> >
> > In fact, it gives me the same results as:
> >
> > q=FLD1:A and FLD2:B & rows=500 & defType=edismax & fl=FLD1, FLD2
> >
> > I'm wrong somewhere, but I'm unsure where.
> >
> > Is faceting the right way to go or should I be using grouping?
> >
> > Curiously, when I use grouping like this:
> >
> > q=FLD1:A and FLD2:B &rows=500 &defType=edismax &indent=true &fl=FLD1,
> FLD2
> > &group=true &group.field=FLD1 &group.field=FLD2
> >
> > I get 2 records as well.
> >
> > Has anyone dealt with mimicing "select distinct" in Solr?
> >
> > Any advice would be very appreciated.
> >
> > Mark
> >
>

Re: Nested documents

2011-09-11 Thread Erick Erickson

Does this JIRA apply?

https://issues.apache.org/jira/browse/LUCENE-3171

Best
Erick

On Sat, Sep 10, 2011 at 8:32 PM, Andy  wrote:
> Hi,
>
> Does Solr support nested documents? If not is there any plan to add such a 
> feature?
>
> Thanks.

Re: How to write this query?

2011-09-11 Thread Erick Erickson

So are you still having a problem, and if so what?

Best
Erick

On Sat, Sep 10, 2011 at 5:48 AM, crisfromnova  wrote:
> Hi,
>
> key:value1^8 key:value2^4 key:value3^2 is correct.
>
> Sorry for bad query written.
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-to-write-this-query-tp3318577p3325033.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: searching for terms containing embedded spaces

2011-09-11 Thread Erick Erickson

Try escaping it for a start.

But why do you want to? If it's a phrase query, enclose it in double quotes.
You really have to provide more details, because there are too many
possibilities
to answer. For instance:

If you're entering field:a b then 'b' will be searched against your
default text field
and you should enter field:(a b) or field:a field:b

If you've tokenized the field, you shouldn't care.

If you're using keywordanalyzer, escaping should work.

Etc.

Best
Erick

On Fri, Sep 9, 2011 at 8:11 PM, Mark juszczec  wrote:
> Hi folks
>
> I've got a field that contains 2 words separated by a single blank.
>
> What's the trick to creating a search string that contains the single blank?
>
> Mark
>

Re: solr equivalent of "select distinct"

2011-09-11 Thread Erick Erickson

This smells like an XY problem, can you back up and give a higher-level
reason *why* you want this behavior?

Because given your problem description, this seems like you are getting
correct behavior no matter how you define the problem. You're essentially
saying that you have two records with identical beginnings of your PK,
why is it incorrect to give you both records?

But, anyway, if you're searching on FLD1 and FLD2, then by definition
you're going to get both records back or the search would be failing!

Best
Erick

On Fri, Sep 9, 2011 at 8:08 PM, Mark juszczec  wrote:
> Hello everyone
>
> Let's say each record in my index contains fields named PK, FLD1, FLD2, FLD3
>  FLD100
>
> PK is my solr primary key and I'm creating it by concatenating
> FLD1+FLD2+FLD3 and I'm guaranteed that combination will be unique
>
> Let's say 2 of these records have FLD1 = A and FLD2 = B.  I am unsure about
> the remaining fields
>
> Right now, if I do a query specifying FLD1 = A and FLD2 = B then I get both
> records.  I only want 1.
>
> Research says I should use faceting.  But this:
>
> q=FLD1:A and FLD2:B & rows=500 & defType=edismax & fl=FLD1, FLD2 &
> facet=true & facet_field=FLD1 & facet_field=FLD2
>
> gives me 2 records.
>
> In fact, it gives me the same results as:
>
> q=FLD1:A and FLD2:B & rows=500 & defType=edismax & fl=FLD1, FLD2
>
> I'm wrong somewhere, but I'm unsure where.
>
> Is faceting the right way to go or should I be using grouping?
>
> Curiously, when I use grouping like this:
>
> q=FLD1:A and FLD2:B &rows=500 &defType=edismax &indent=true &fl=FLD1, FLD2
> &group=true &group.field=FLD1 &group.field=FLD2
>
> I get 2 records as well.
>
> Has anyone dealt with mimicing "select distinct" in Solr?
>
> Any advice would be very appreciated.
>
> Mark
>

Re: Running solr on small amounts of RAM

2011-09-11 Thread Erick Erickson

Well, this answer isn't much more satisfactory than "get more memory",
but about all I can say is "try it and see".

Sure, make your caches very small and monitor memory and test it out.

You'll get a sense of how fast (or slow) the queries are pretty quickly. Or
you can get a ballpark estimate of what running without caches would
do performance wise by simply measuring the first query after a restart.

Because, unfortunately, "it depends" is the only accurate answer. It
depends on how much sorting, faceting etc. you do as well as the
queries themselves.

Best
Erick

On Fri, Sep 9, 2011 at 12:48 PM, Mike Austin  wrote:
> I'm trying to push to get solr used in our environment. I know I could have
> responses saying WHY can't you get more RAM etc.., but lets just skip those
> and work with this situation.
>
> Our index is very small with 100k documents and a light load at the moment.
> If I wanted to use the smallest possible RAM on the server, how would I do
> this and what are the issues?
>
> I know that caching would be the biggest lose but if solr ran with no to
> little caching, the performance would still be ok? I know this is a relative
> question..
> This is the only application using java on this machine, would tuning java
> to use less cache help anything?
> I should set the cache settings low in the config?
> Basically, what will having a very low cache hit rate do to search speed and
> server performance?  I know more is better and it depends on what I'm
> comparing it to but if you could just answer in some way saying that it's
> not going to cripple the machine or cause 5 second searches?
>
> It's on a windows server.
>
>
> Thanks,
> Mike
>

Solr messing up the UK GBP (pound) symbol in response, even though Java environment variabe has file encoding is set to UTF 8....

2011-09-11 Thread Ravish Bhagdev

Any idea why solr is unable to return the pound sign as-is?

I tried typing in £ 1 million in Solr admin GUI and got following response.



0
5

on
0
Â£ 1 million
10
2.2





Here is my Java Properties I got also from admin interface:

java.runtime.name = Java(TM) SE Runtime Environment
sun.boot.library.path = /usr/lib/jvm/java-6-sun-1.6.0.26/jre/lib/amd64
java.vm.version = 20.1-b02
solr.data.dir = target/solr_data
shared.loader =
java.vm.vendor = Sun Microsystems Inc.
java.vendor.url = http://java.sun.com/
path.separator = :java.vm.name = Java HotSpot(TM) 64-Bit Server VM
tomcat.util.buf.StringCache.byte.enabled = true
file.encoding.pkg = sun.io
user.country = GB
sun.java.launcher = SUN_STANDARD
sun.os.patch.level = unknownjava.vm.specification.name = Java Virtual
Machine Specification
user.dir = /home/rbhagdev/SCCRepos/SCC_Platform/search/solr
java.runtime.version = 1.6.0_26-b03
java.awt.graphicsenv = sun.awt.X11GraphicsEnvironment
java.endorsed.dirs = /usr/lib/jvm/java-6-sun-1.6.0.26/jre/lib/endorsed
os.arch = amd64
java.io.tmpdir = /tmp
line.separator =

java.vm.specification.vendor = Sun Microsystems Inc.
java.naming.factory.url.pkgs = org.apache.namingos.name = Linux
classworlds.conf = /usr/share/maven2/bin/m2.conf
sun.jnu.encoding = UTF-8
java.library.path =
/usr/lib/jvm/java-6-sun-1.6.0.26/jre/lib/amd64/server:/usr/lib/jvm/java-6-sun-1.6.0.26/jre/lib/amd64:/usr/lib/jvm/java-6-sun-1.6.0.26/jre/../lib/amd64:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/libjava.specification.name
= Java Platform API Specification
java.class.version = 50.0
sun.management.compiler = HotSpot 64-Bit Tiered Compilers
os.version = 2.6.38-11-generic
user.home = /home/rbhagdev
user.timezone = Europe/London
catalina.useNaming = true
java.awt.printerjob = sun.print.PSPrinterJob
java.specification.version = 1.6
file.encoding = UTF-8
solr.solr.home = src/test/resources/solr_home
catalina.home =
/home/rbhagdev/SCCRepos/SCC_Platform/search/solr/target/tomcatuser.name
= rbhagdev
java.class.path = /usr/share/maven2/boot/classworlds.jar
java.naming.factory.initial = org.apache.naming.java.javaURLContextFactory
package.definition =
sun.,java.,org.apache.catalina.,org.apache.coyote.,org.apache.tomcat.,org.apache.jasper.
java.vm.specification.version = 1.0
sun.arch.data.model = 64
java.home = /usr/lib/jvm/java-6-sun-1.6.0.26/jre
sun.java.command = org.codehaus.classworlds.Launcher "tomcat:run-war"
java.specification.vendor = Sun Microsystems Inc.
user.language = enjava.vm.info = mixed mode
java.version = 1.6.0_26
java.ext.dirs =
/usr/lib/jvm/java-6-sun-1.6.0.26/jre/lib/ext:/usr/java/packages/lib/ext
securerandom.source = file:/dev/./urandom
sun.boot.class.path =
/usr/lib/jvm/java-6-sun-1.6.0.26/jre/lib/resources.jar:/usr/lib/jvm/java-6-sun-1.6.0.26/jre/lib/rt.jar:/usr/lib/jvm/java-6-sun-1.6.0.26/jre/lib/sunrsasign.jar:/usr/lib/jvm/java-6-sun-1.6.0.26/jre/lib/jsse.jar:/usr/lib/jvm/java-6-sun-1.6.0.26/jre/lib/jce.jar:/usr/lib/jvm/java-6-sun-1.6.0.26/jre/lib/charsets.jar:/usr/lib/jvm/java-6-sun-1.6.0.26/jre/lib/modules/jdk.boot.jar:/usr/lib/jvm/java-6-sun-1.6.0.26/jre/classes
java.vendor = Sun Microsystems Inc.
server.loader =
maven.home = /usr/share/maven2
catalina.base = /home/rbhagdev/SCCRepos/SCC_Platform/search/solr/target/tomcat
file.separator = /
java.vendor.url.bug = http://java.sun.com/cgi-bin/bugreport.cgi
common.loader = ${catalina.home}/lib,${catalina.home}/lib/*.jar
sun.cpu.endian = little
sun.io.unicode.encoding = UnicodeLittle
package.access =
sun.,org.apache.catalina.,org.apache.coyote.,org.apache.tomcat.,org.apache.jasper.,sun.beans.
sun.desktop = gnome
sun.cpu.isalist =

Thanks,

Ravish

Re: NRT and commit behavior

2011-09-11 Thread Erick Erickson

Hmm, OK. You might want to look at the non-cached filter query stuff,
it's quite recent.
The point here is that it is a filter that is applied only after all
of the less expensive filter
queries are run, One of its uses is exactly ACL calculations. Rather
than calculate the
ACL for the entire doc set, it only calculates access for docs that
have made it past
all the other elements of the query See SOLR-2429 and note that it
is a 3.4 (currently
being released) only.

As to why your commits are taking so long, I have no idea given that
you really haven't
given us much to work with.

How big is your index? Are you optimizing? Have you profiled the application to
see what the bottleneck is (I/O, CPU, etc?). What else is running on your
machine? It's quite surprising that it takes that long. How much memory are you
giving the JVM? etc...

You might want to review: http://wiki.apache.org/solr/UsingMailingLists

Best
Erick


On Fri, Sep 9, 2011 at 9:41 AM, Tirthankar Chatterjee
 wrote:
> Erick,
> What you said is correct for us the searches are based on some Active 
> Directory permissions which are populated in Filter query parameter. So we 
> don't have any warming query concept as we cannot fire for every user ahead 
> of time.
>
> What we do here is that when user logs in we do an invalid query(which return 
> no results instead of '*') with the correct filter query (which is his 
> permissions based on the login). This way the cache gets warmed up with valid 
> docs.
>
> It works then.
>
>
> Also, can you please let me know why commit is taking 45 mins to 1 hours on a 
> good resourced hardware with multiple processors and 16gb RAM 64 bit VM, etc. 
> We tried passing waitSearcher as false and found that inside the code it hard 
> coded to be true. Is there any specific reason. Can we change that value to 
> honor what is being passed.
>
> Thanks,
> Tirthankar
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Thursday, September 01, 2011 8:38 AM
> To: solr-user@lucene.apache.org
> Subject: Re: NRT and commit behavior
>
> Hmm, I'm guessing a bit here, but using an invalid query doesn't sound very 
> safe, but I suppose it *might* be OK.
>
> What does "invalid" mean? Syntax error? not safe.
>
> search that returns 0 results? I don't know, but I'd guess that filling your 
> caches, which is the point of warming queries, might be short circuited if 
> the query returns
> 0 results but I don't know for sure.
>
> But the fact that "invalid queries return quicker" does not inspire 
> confidence since the *point* of warming queries is to spend the time up front 
> so your users don't have to wait.
>
> So here's a test. Comment out your warming queries.
> Restart your server and fire the warming query from the browser 
> with&debugQuery=on and look at the QTime parameter.
>
> Now fire the same form of the query (as in the same sort, facet, grouping, 
> etc, but presumably a valid term). See the QTime.
>
> Now fire the same form of the query with a *different* value in the query. 
> That is, it should search on different terms but with the same sort, facet, 
> etc. to avoid getting your data straight from the queryResultCache.
>
> My guess is that the last query will return much more quickly than the second 
> query. Which would indicate that the first form isn't doing you any good.
>
> But a test is worth a thousand opinions.
>
> Best
> Erick
>
> On Wed, Aug 31, 2011 at 11:04 AM, Tirthankar Chatterjee 
>  wrote:
>> Also noticed that "waitSearcher" parameter value is not  honored inside 
>> commit. It is always defaulted to true which makes it slow during indexing.
>>
>> What we are trying to do is use an invalid query (which wont return any 
>> results) as a warming query. This way the commit returns faster. Are we 
>> doing something wrong here?
>>
>> Thanks,
>> Tirthankar
>>
>> -Original Message-
>> From: Jonathan Rochkind [mailto:rochk...@jhu.edu]
>> Sent: Monday, July 18, 2011 11:38 AM
>> To: solr-user@lucene.apache.org; yo...@lucidimagination.com
>> Subject: Re: NRT and commit behavior
>>
>> In practice, in my experience at least, a very 'expensive' commit can
>> still slow down searches significantly, I think just due to CPU (or
>> i/o?) starvation. Not sure anything can be done about that.  That's my 
>> experience in Solr 1.4.1, but since searches have always been async with 
>> commits, it probably is the same situation even in more recent versions, I'd 
>> guess.
>>
>> On 7/18/2011 11:07 AM, Yonik Seeley wrote:
>>> On Mon, Jul 18, 2011 at 10:53 AM, Nicholas Chase  
>>> wrote:
 Very glad to hear that NRT is finally here!  But my question is this:
 will things still come to a standstill during a commit?
>>> New updates can now proceed in parallel with a commit, and searches
>>> have always been completely asynchronous w.r.t. commits.
>>>
>>> -Yonik
>>> http://www.lucidimagination.com
>>>
>> **Legal Disclaimer**

Re: Using multivalued field in map function

2011-09-11 Thread Erick Erickson

Hmmm, would it be simpler to do something like append
a clause like this?
BloggerId:12304^10 OR CoBloggerId:123404^5?

Best
Erick

On Fri, Sep 9, 2011 at 2:14 AM, tkamphuis  wrote:
> Well, I'd like to do the following:
>
> I've got a website full of blogposts and every blogpost has an owner, this
> owner is refererred to through his/her id. For example: BloggerId = 123.
> It's also possible that the blog has multiple co-writers, which are also
> referred to by there BloggerId but these id's are stored in the multivalue
> field, in my previous example SubIds.
>
> When searching for a specific blogger one searches the BloggerId.
> Searchresults are influenced by a number of variables, the
> country/state/more specific geological data, the blogcategory, etc. For this
> I use a facetted query. Next I want to make some results more important,
> depending on the BloggerId, I tried to do this with the following query:
>
> ?q={!func}map(sum(map(BloggerId,12304,12304,2,0),map(BloggerId,12304,12304,1,0)),3,3,2)&fl=*,score&facet.field=Country&f.Country.facet.limit=6&facet.field=State&fq=(BlogCategory:internet%20OR%20BlogCategory:sports&sort=score%20desc,Top%20desc,%20SortPriority%20asc&start=0&omitHeader=true
>
> In the resulting list, blogs written by BloggerId 12304 should be on top of
> the list, followed by the blogs where BloggerId 12304 was co-writer. After
> that, all other blogs that follow the criteria but aren't written (or
> co-written) by BloggerId 12304.
>
> Any ideas? Thanks!
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Using-multivalued-field-in-map-function-tp3318843p3322023.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: indexing data from rich documents - Tika with solr3.1

2011-09-11 Thread scorpking

oh, it is good for me. Thank Erik Hatcher-4 very much. I have done to index
from https. 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/indexing-data-from-rich-documents-Tika-with-solr3-1-tp3322555p3326971.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Full-search index for the database

2011-09-11 Thread Jamie Johnson

You should create separate fields in your solr schema for each field
in your database that you want recognized separately.  You can use a
query parser like edismax to do a weighted query across all of your
fields and then provide highlighting on the specific field which
matched.

2011/9/10 Eugeny Balakhonov :
> I want to create full-text search for my database.
>
> It means that search engine should look up some string for all fields of my
> database.
>
> I have created Solr configuration for extracting and indexing data from a
> database.
>
>
>
>
>
> According documentation in the file schema.xml I have created field for
> full-text search index:
>
>
>
>  multiValued="true"/>
>
>
>
> Also I have added strings for copying all values of all fields into this
> full-search field:
>
>
>
> ...
>
>    
>
> ...
>
>
>
> In result I have possibility to search for all fields in my database. But I
> can't recognize which field in the found record contains requested string.
>
> Highlighting functionality just marks string in the "TEXT" field like
> following:
>
>
>
> 
>
> 
>
>  
>
>    Any text any text Test"
>
>  
>
> 
>
> 
>
>  
>
>   Any text any text Test"
>
>  
>
> 
>
>
>
> How to create full-search index with possibility to recognize source
> database field?
>
>
>
> Thx a lot.
>
> Eugeny
>
>

42 matches

Mail list logo