date:20111114

Re: Solr 3.3 Sorting is not working for long fields

2011-11-14 Thread rajini maski

Field type is long and not multi valued.
Using solr 3.3 war file ,
Tried on solr 1.4.1 index and solr 3.3 index , both cases its not working.

query :
http://localhost:8091/Group/select?/&indent=on&q=studyid:120&sort=studyidasc,groupid
asc,subjectid asc&start=0&rows=10

all the ID fields are long

Thanks & Regards
Rajani


On Sun, Nov 13, 2011 at 7:58 AM, Erick Erickson wrote:

> Well, 3.3 has been around for quite a while, I'd suspect that
> something this fundamental would have been found...
>
> Is your field multi-valued? And what kind of field is
> studyid?
>
> You really have to provide more details, input, output, etc
> to get reasonable help. It might help to review:
>
> http://wiki.apache.org/solr/UsingMailingLists
>
> Best
> Erick
>
> On Fri, Nov 11, 2011 at 5:52 AM, rajini maski 
> wrote:
> > Hi,
> >
> > I have upgraded my Solr from 1.4.1 to 3.3.Now I tried to sort
> > on a long field and documents are not getting sorted based on that.
> >
> > Sort is working when we do sorting on facet ex:facet=on
> &facet.sort=studyid
> >
> > But when do simple sort on documents , sort=studyid,  sort doesn't
> happen.
> > Is there any bug ?
> >
> >
> >
> > Regards,
> > Rajani
> >
>

Dismax, pf and qf

2011-11-14 Thread Andrea Gazzarini

Hi all,
In my dismax request handler I'm usually using both qf and pf
parameters in order to do phrse and query search with different
boosting.

Now there are some scenario when I want just the pf active (without
qf). Othen then surrounding my query with double quotes, is there
another way to do that? I mean, i would like to do the following

_query:"{!dismax pf=author^100}vincent kwner"

And that would fire a phrase search, not also

vincent OR knwer

By completelty ignoring the qf settings. I saw that if i omit the qf
parameter SOLR uses the default field and subsequently it returns no
result, even if the pf query is matching a record.

Regards,
Andrea

Re: Solr 3.3 Sorting is not working for long fields

2011-11-14 Thread Michael Kuhlmann


Am 14.11.2011 09:33, schrieb rajini maski:

query :
http://localhost:8091/Group/select?/&indent=on&q=studyid:120&sort=studyidasc,groupid
asc,subjectid asc&start=0&rows=10


Is it a copy-and-paste error, or did you realls sort on "studyidasc"?

I don't think you have a field studyidasc, and Solr should've given an 
exception that either asc or desc is missing.


-Kuli

Re: getting solr to expand Acronym

2011-11-14 Thread Tiernan OToole

thanks for the replies... the problem with Synonyms is that they would need
to be tracked... there could be new words entered that will need to be
added to the list on a regular basis...

@Otis: As for the option of a custom TokenFilter, how would that work? i
have not coded anything into Solr or any custom TokenFilters my self... I
am sure theres documentation on this, but how would you think this should
work?

Thanks.

--Tiernan


On Fri, Nov 11, 2011 at 9:01 PM, Brandon Ramirez <
brandon_rami...@elementk.com> wrote:

> Could this be simulated through synonyms?  Could you define "CD" as a
> synonym of "Compact Disc" or vice versa?  I'm not sure if that would work,
> just brainstorming here...
>
>
> Brandon Ramirez | Office: 585.214.5413 | Fax: 585.295.4848
> Software Engineer II | Element K | www.elementk.com
>
>
> -Original Message-
> From: Tiernan OToole [mailto:lsmart...@gmail.com]
> Sent: Friday, November 11, 2011 5:10 AM
> To: solr-user@lucene.apache.org
> Subject: getting solr to expand Acronym
>
> Dont know if this is posible, but  i need to ask anyway... Say we have a
> list of Acronyms in a database (CD, DVD, CPU) and also a list of their not
> so short names (Compact Disk, Digital Versitile Disk, Central Processing
> Unit) but they are not linked in any particular way (lost of items, some
> with full names, some using anronyms), is it posible for Solr to figure out
> CD is an Acronym of Compact Disk? I know CD could also mean Central Data,
> or anything that beings with C and D, but is there a way to tell solr to
> look for items that not only match CD, but have words next to each other
> that begin with C and D... Another example i can think of is IBM: It could
> be International Business Machines, or Irish Business Machines, or Irish
> Banking Machines...
>
> So, would that be posible?
>
> --
> Tiernan O'Toole
> blog.lotas-smartman.net
> www.geekphotographer.com
> www.tiernanotoole.ie
>



-- 
Tiernan O'Toole
blog.lotas-smartman.net
www.geekphotographer.com
www.tiernanotoole.ie

Re: delta-import of rich documents like word and pdf files!

2011-11-14 Thread Ahmet Arslan

> Thanks for your reply Mr. Erick
> All I want to do is that I have indexed some of my pdf
> files and doc files.
> Now, any changes I make to them, I want a
> delta-import(incremental) so that
> I do not have to re index whole document by full import .
> Only changes made
> to these documents should get updated. I am using
> "dataimporthandler". I
> have seen in forums but all of them have queried for delta
> import related to
> databases. I am just indexing some of my doc and pdf files
> for now.
> What should I do in order to achieve that?

Can you provide your data-config.xml?

Re: Delete by Query with limited number of rows

2011-11-14 Thread mikr00

Hi Erick, hi Yury,

thanks to your input I found a perfect solution for my case. Even though
this is not a solr-only solution, I will just briefly describe how it works
since it might be of interest to others:

I have put up a mysql database holding two tables. The first only has a
primarykey with auto-increment and nothing else. The second has a primarykey
but without auto-increment and also fields for the content I store in solr. 

Now, before I add something to the solr core, I add an entry to the first
mysql database. After the insertion, I get the primarykey for the action. I
check, whether it is above my limit of documents. If so, I empty the first
mysql table and reset the auto-increment to zero. I than insert a mysql
entry to the second table using the primarykey taken from the first table
(if the primarykey exists, I do not add an entry but update the existing
one). And finally I have a solr core which holds my searchable data and has
a uniquekey field. Into this core I add a new document by using the
primarykey from the first mysql table for the uniquekey field.

The solution has two main benefits for me:

- I can precisely control the number of documents in my solr core.
- I do now also have a backup of my data in mysql

Thank you very much for your help!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Delete-by-Query-with-limited-number-of-rows-tp3503094p3506380.html
Sent from the Solr - User mailing list archive at Nabble.com.

Counting in facet results

2011-11-14 Thread LT.thomas

Hi,

By counting in facet results I mean resolve the problem:

I have 7 documents:

A1   B1   C1
A2   B1   C1
A3   B2   C1
A4   B2   C2
A5   B3   C2
A6   B3   C2
A7   B3   C2

If I make the facet query by field B, get the result: B1=2, B2=2, B3=3.
A1   B1   C1
A2   B1   C1 2 - facing by B
--===
A3   B2   C1
A4   B2   C2 2 - facing by B
--===
A5   B3   C2
A6   B3   C2
A7   B3   C2 3 - facing by B

I wont to get additional information, something like count in results, by
field C. So, how can I query to get a result similar to the following:
A1   B1   C1
A2   B1   C1 2, 1 - facing by B, count C in facet results
--=
A3   B2   C1
A4   B2   C2 2, 2 - facing by B, count C in facet results
--=
A5   B3   C2
A6   B3   C2
A7   B3   C2 2, 1 - facing by B, count C in facet results


Thanks 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Counting-in-facet-results-tp3506382p3506382.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: delta-import of rich documents like word and pdf files!

2011-11-14 Thread neuron005

Thanks for your reply...my data-config.xml is

  
 


 
 
 
 

 






--
View this message in context: 
http://lucene.472066.n3.nabble.com/delta-import-of-rich-documents-like-word-and-pdf-files-tp3502039p3506404.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Counting in facet results

2011-11-14 Thread Samuel García Martínez

Hi, i think what you are looking for is "*nested facets*" or *
HierarchicalFaceting *
*
*
Category A - Subcategory A1
Category A - Subcategory A1
Category B - Subcategory A1
Category B - Subcategory B2
Category A - Subcategory A2

Faceting by Category:
 A: 3
 B: 2

In addition, pivoting this query:
Cat: A=3
  SubCat: A1=2 and A2=1
Cat: B=2
  SubCat: A1=1 and B2=1

This makes sense?

On Mon, Nov 14, 2011 at 11:02 AM, LT.thomas  wrote:

> Hi,
>
> By counting in facet results I mean resolve the problem:
>
> I have 7 documents:
>
> A1   B1   C1
> A2   B1   C1
> A3   B2   C1
> A4   B2   C2
> A5   B3   C2
> A6   B3   C2
> A7   B3   C2
>
> If I make the facet query by field B, get the result: B1=2, B2=2, B3=3.
> A1   B1   C1
> A2   B1   C1 2 - facing by B
> --===
> A3   B2   C1
> A4   B2   C2 2 - facing by B
> --===
> A5   B3   C2
> A6   B3   C2
> A7   B3   C2 3 - facing by B
>
> I wont to get additional information, something like count in results, by
> field C. So, how can I query to get a result similar to the following:
> A1   B1   C1
> A2   B1   C1 2, 1 - facing by B, count C in facet results
> --=
> A3   B2   C1
> A4   B2   C2 2, 2 - facing by B, count C in facet results
> --=
> A5   B3   C2
> A6   B3   C2
> A7   B3   C2 2, 1 - facing by B, count C in facet results
>
>
> Thanks
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Counting-in-facet-results-tp3506382p3506382.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Un saludo,
Samuel García.

Re: delta-import of rich documents like word and pdf files!

2011-11-14 Thread Ahmet Arslan


> Thanks for your reply...my
> data-config.xml is
> 
>          type="BinFileDataSource" name="bin"/>
> 
>              name="f" pk="id" processor="FileListEntityProcessor"
> recursive="true" 
> rootEntity="false" 
>  dataSource="null"  baseDir="/var/data/solr" 
> fileName=".*\.(DOC)|(PDF)|(XML)|(xml)|(JPEG)|(jpg)|(ZIP)|(zip)|(pdf)|(doc)"
> onError="skip" 
> 
> >
> 
>              name="tika-test" processor="TikaEntityProcessor" 
> url="${f.fileAbsolutePath}" format="text" dataSource="bin"
> onError="skip"> 
>                
>  
>                
>  
>                
>  
>     
>  
>       name="fileName"/>
> 
>         
>         
> 

According to wiki : "the only EntityProcessor which supports delta is 
SqlEntityProcessor."

May be you can use newerThan parameter of FileListEntityProcessor. Issuing a 
full-import with &clean=false may mimic delta import. 

You can pass value of this newerThan parameter in your request.

command=full-import&clean=false&myLastModifiedParam=NOW-3DAYS

http://wiki.apache.org/solr/DataImportHandler#Accessing_request_parameters

Re: TikaEntityProcessor not working?

2011-11-14 Thread kumar8anuj

Earlier issue has been resolved but stuck up on something else. Can you tell
me which poi jar version would work with tika.0.6. Currently I have 
poi-3.7.jar. Error which i am getting is this 

SEVERE: Exception while processing: js_logins document :
SolrInputDocument[{id=id(1.0)={100984},
complete_mobile_number=complete_mobile_number(1.0)={+91 9600067575},
emailid=emailid(1.0)={vkry...@gmail.com}, full_name=full_name(1.0)={Venkat
Ryali}}]:org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.NoSuchMethodError:
org.apache.poi.xwpf.usermodel.XWPFParagraph.(Lorg/openxmlformats/schemas/wordprocessingml/x2006/main/CTP;Lorg/apache/poi/xwpf/usermodel/XWPFDocument;)V
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:669)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:622)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:622)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:268)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408)
Caused by: java.lang.NoSuchMethodError:
org.apache.poi.xwpf.usermodel.XWPFParagraph.(Lorg/openxmlformats/schemas/wordprocessingml/x2006/main/CTP;Lorg/apache/poi/xwpf/usermodel/XWPFDocument;)V
at
org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator$MyXWPFParagraph.(XWPFWordExtractorDecorator.java:163)
at
org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator$MyXWPFParagraph.(XWPFWordExtractorDecorator.java:161)
at
org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator.extractTableContent(XWPFWordExtractorDecorator.java:140)
at
org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator.buildXHTML(XWPFWordExtractorDecorator.java:91)
at
org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:69)
at
org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:51)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101)
at
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:128)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:238)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:596)
... 7 more


--
View this message in context: 
http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p3506596.html
Sent from the Solr - User mailing list archive at Nabble.com.

TREC-style IR experiments

2011-11-14 Thread Ismo Raitanen

Hi,

I'm planning to do some information retrieval experiments with Solr.
I'd like to compare different IR methods. I have a test collection
with topics and judgements available. I'm considering using Solr (and
not Lemur/Indri etc.) for the tests, because Solr supports several
nice methods out-of-the-box, e.g. n-grams.

Finally, I plan to evaluate the different methods and their results
with trec_eval or similar program. What I need is a program, which
puts Solr results in a suitable format for trec_eval. I think I can
get the Solr search results in that format quite easily by using the
solr-php-client library.

Have any of you run "TREC-style" IR experiments with Solr and what are
your experiences with that? Do you have any suggestion for that kind
of tests with Solr?

Kind regards,
Ismo

Re: Using solr during optimization

2011-11-14 Thread Isan Fulia

Hi Mark,

In the above case , what if  the index is optimized partly ie. by
specifying the max no of segments we want.
It has been observed that after optimizing(even partly optimization), the
indexing as well as searching had been faster than in case of an
unoptimized one.
Decreasing the merge factor will affect  the performance as it will
increase the indexing time due to the frequent merges.
So is it good that we optimize partly(let say once in a month), rather than
decreasing the merge factor and affect  the indexing speed.Also since we
will be sharding, that 100 GB index will be divided in different shards.

Thanks,
Isan Fulia.



On 14 November 2011 11:28, Kalika Mishra wrote:

> Hi Mark,
>
> Thanks for your reply.
>
> What you saying is interesting; so are you suggesting that optimizations
> should be done usually when there not many updates. Also can you please
> point out further under what conditions optimizations might be beneficial.
>
> Thanks.
>
> On 11 November 2011 20:30, Mark Miller  wrote:
>
> > I would not optimize - it's very expensive. With 11,000 updates a day, I
> > think it makes sense to completely avoid optimizing.
> >
> > That should be your default move in any case. If you notice performance
> > suffers more than is acceptable (good chance you won't), then I'd use a
> > lower merge factor. It defaults to 10 - lower numbers will lower the
> number
> > of segments in your index, and essentially amortize the cost of an
> optimize.
> >
> > Optimize is generally only useful when you will have a mostly static
> index.
> >
> > - Mark Miller
> > lucidimagination.com
> >
> >
> > On Nov 11, 2011, at 9:12 AM, Kalika Mishra wrote:
> >
> > > Hi Mark,
> > >
> > > We are performing almost 11,000 updates a day, we have around 50
> million
> > > docs in the index (i understand we will need to shard) the core seg
> will
> > > get fragmented over a period of time. We will need to do optimize every
> > few
> > > days or once in a month; do you have any reason not to optimize the
> core.
> > > Please let me know.
> > >
> > > Thanks.
> > >
> > > On 11 November 2011 18:51, Mark Miller  wrote:
> > >
> > >> Do a you have something forcing you to optimize, or are you just doing
> > it
> > >> for the heck of it?
> > >>
> > >> On Nov 11, 2011, at 7:50 AM, Kalika Mishra wrote:
> > >>
> > >>> Hi,
> > >>>
> > >>> I would like to optimize solr core which is in Reader Writer mode.
> > Since
> > >>> the Solr cores are huge in size (above 100 GB) the optimization takes
> > >> hours
> > >>> to complete.
> > >>>
> > >>> When the optimization is going on say. on the Writer core, the
> > >> application
> > >>> wants to continue using the indexes for both query and write
> purposes.
> > >> What
> > >>> is the best approach to do this.
> > >>>
> > >>> I was thinking of using a temporary index (empty core) to write the
> > >>> documents and use the same Reader to read the documents. (Please note
> > >> that
> > >>> temp index and the Reader cannot be made Reader Writer as Reader is
> > >> already
> > >>> setup for the Writer on which optimization is taking place) But there
> > >> could
> > >>> be some updates to the temp index which I would like to get reflected
> > in
> > >>> the Reader. Whats the best setup to support this.
> > >>>
> > >>> Thanks,
> > >>> Kalika
> > >>
> > >> - Mark Miller
> > >> lucidimagination.com
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >
> > >
> > > --
> > > Thanks & Regards,
> > > Kalika
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
>
>
> --
> Thanks & Regards,
> Kalika
>



-- 
Thanks & Regards,
Isan Fulia.

Re: Solr 3.3 Sorting is not working for long fields

2011-11-14 Thread rajini maski

There is no error as such.

When I do a basic sort on *long *field. the sort doesn't happen.


Query is :

-

  0
  3
 
-

  studyid
  studyid asc
  on
  0
  *:*
  100
  2.2
 
 





- 
- 
  53
  
- 
  18
  
- 
  14
  
- 
  11
  
- 
  7
  
- 
  63
  
- 
  35
  
- 
  70
  
- 
  91
  
- 
  97
  
  
  


The same case works with Solr1.4.1 but it is not working solr 3.3


Regards,
Rajani

On Mon, Nov 14, 2011 at 2:23 PM, Michael Kuhlmann  wrote:

> Am 14.11.2011 09:33, schrieb rajini maski:
>
>> query :
>> http://localhost:8091/Group/**select?/&indent=on&q=studyid:**
>> 120&sort=studyidasc,groupid
>> asc,subjectid asc&start=0&rows=10
>>
>
> Is it a copy-and-paste error, or did you realls sort on "studyidasc"?
>
> I don't think you have a field studyidasc, and Solr should've given an
> exception that either asc or desc is missing.
>
> -Kuli
>

Re: Counting in facet results

2011-11-14 Thread LT.thomas

I use Solandra that integrates Solr 3.4 with Cassandra. So, is there any way
to solve this problem with Solr 3.4 (without pivots)?

Your results are:
Cat: A=3
  SubCat: A1=2 and A2=1
Cat: B=2
  SubCat: A1=1 and B2=1

but I would like to have:
Cat: A=3
  SubCat: 2 (losing information about the numbers within A1 and A2, only
distinct count of subcategories)
Cat: B=2
  SubCat: 2 (losing information about the numbers within A1 and B2, only
distinct count of subcategories)

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Counting-in-facet-results-tp3506382p3506848.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: TREC-style IR experiments

2011-11-14 Thread Ahmet Arslan

> I'm planning to do some information retrieval experiments
> with Solr.
> I'd like to compare different IR methods. I have a test
> collection
> with topics and judgements available. I'm considering using
> Solr (and
> not Lemur/Indri etc.) for the tests, because Solr supports
> several
> nice methods out-of-the-box, e.g. n-grams.
> 
> Finally, I plan to evaluate the different methods and their
> results
> with trec_eval or similar program. What I need is a
> program, which
> puts Solr results in a suitable format for trec_eval. I
> think I can
> get the Solr search results in that format quite easily by
> using the
> solr-php-client library.
> 
> Have any of you run "TREC-style" IR experiments with Solr
> and what are
> your experiences with that? Do you have any suggestion for
> that kind
> of tests with Solr?

There some existing implementations in Lucene

http://lucene.apache.org/java/3_0_2/api/contrib-benchmark/org/apache/lucene/benchmark/quality/trec/package-summary.html

Casesensitive search problem

2011-11-14 Thread jayanta sahoo

Hi,
Whenever I am searching with the words "OfficeJet" or "officejet" or
"Officejet" or "oFiiIcejET". I am getting the different results for each
search respectively. I am not able to understand why this is happening?
   I want to solve this problem such a way that search will become case
insensitive and I will get same result for any combination of capital and
small letters.

-- 
Jayanta Sahoo

Re: Solr 3.3 Sorting is not working for long fields

2011-11-14 Thread Ahmet Arslan

> When I do a basic sort on *long *field. the sort doesn't
> happen.
> 
> 
> Query is :
> 
> -
> 
>   0
>   3
>  
> -
> 
>   studyid
>   studyid asc
>   on
>   0
>   *:*
>   100
>   2.2
>  
>  
> 
> 
> 
> 
> 
> - 
> - 
>   53
>   
> - 
>   18
>   
> - 
>   14
>   
> - 
>   11
>   
> - 
>   7
>   
> - 
>   63
>   
> - 
>   35
>   
> - 
>   70
>   
> - 
>   91
>   
> - 
>   97
>   
>   
>   
> 
> 
> The same case works with Solr1.4.1 but it is not working
> solr 3.3

Can you try with the following type?

  

And studyid must be marked as indexed="true".

Re: Casesensitive search problem

2011-11-14 Thread Parvin Gasimzade

Check this :
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.LowerCaseFilterFactory

On Mon, Nov 14, 2011 at 3:24 PM, jayanta sahoo  wrote:

> Hi,
> Whenever I am searching with the words "OfficeJet" or "officejet" or
> "Officejet" or "oFiiIcejET". I am getting the different results for each
> search respectively. I am not able to understand why this is happening?
>   I want to solve this problem such a way that search will become case
> insensitive and I will get same result for any combination of capital and
> small letters.
>
> --
> Jayanta Sahoo
>

Re: Using solr during optimization

2011-11-14 Thread Mark Miller

On Nov 14, 2011, at 8:27 AM, Isan Fulia wrote:

> Hi Mark,
> 
> In the above case , what if  the index is optimized partly ie. by
> specifying the max no of segments we want.
> It has been observed that after optimizing(even partly optimization), the
> indexing as well as searching had been faster than in case of an
> unoptimized one.

Yes, this remains true - searching against fewer segments is faster than 
searching against many segments. Unless you have a really high merge factor, 
this is just generally not a big deal IMO.

It tends to be something like, a given query is say 10-30% slower. If you have 
good performance though, this should often be something like a 50ms query goes 
to 80 or 90ms. You really have to decide/test if there is a practical 
difference to your users.

You should also pay attention to how long that perf improvement lasts while you 
are continuously adding more documents. Is it a super high cost for a short 
perf boost?

> Decreasing the merge factor will affect  the performance as it will
> increase the indexing time due to the frequent merges.

True - it will essentially amortize the cost of reducing segments. Have you 
tested lower merge factors though? Does it really slow down indexing to the 
point where you find it unacceptable? I've been surprised in the past. Usually 
you can find a pretty nice balance.

> So is it good that we optimize partly(let say once in a month), rather than
> decreasing the merge factor and affect  the indexing speed.Also since we
> will be sharding, that 100 GB index will be divided in different shards.

Partial optimize is a good option, and optimize is an option. They both exist 
for a reason ;) Many people pay the price because they assume they have to 
though, when they really have no practical need.

Generally, the best way to manage the number of segments in your index is 
through the merge policy IMO - not necessarily optimize calls.

I'm pretty sure optimize also blocks adds in previous version of Solr as well - 
it grabs the commit lock. It won't do that in Solr 4, but that is another 
reason I wouldn't recommend it under normal circumstances.

I look at optimize as a last option, or when creating a static index personally.

> 
> Thanks,
> Isan Fulia.
> 
> 
> 
> On 14 November 2011 11:28, Kalika Mishra wrote:
> 
>> Hi Mark,
>> 
>> Thanks for your reply.
>> 
>> What you saying is interesting; so are you suggesting that optimizations
>> should be done usually when there not many updates. Also can you please
>> point out further under what conditions optimizations might be beneficial.
>> 
>> Thanks.
>> 
>> On 11 November 2011 20:30, Mark Miller  wrote:
>> 
>>> I would not optimize - it's very expensive. With 11,000 updates a day, I
>>> think it makes sense to completely avoid optimizing.
>>> 
>>> That should be your default move in any case. If you notice performance
>>> suffers more than is acceptable (good chance you won't), then I'd use a
>>> lower merge factor. It defaults to 10 - lower numbers will lower the
>> number
>>> of segments in your index, and essentially amortize the cost of an
>> optimize.
>>> 
>>> Optimize is generally only useful when you will have a mostly static
>> index.
>>> 
>>> - Mark Miller
>>> lucidimagination.com
>>> 
>>> 
>>> On Nov 11, 2011, at 9:12 AM, Kalika Mishra wrote:
>>> 
 Hi Mark,

 We are performing almost 11,000 updates a day, we have around 50
>> million
 docs in the index (i understand we will need to shard) the core seg
>> will
 get fragmented over a period of time. We will need to do optimize every
>>> few
 days or once in a month; do you have any reason not to optimize the
>> core.
 Please let me know.

 Thanks.

 On 11 November 2011 18:51, Mark Miller  wrote:

> Do a you have something forcing you to optimize, or are you just doing
>>> it
> for the heck of it?
> 
> On Nov 11, 2011, at 7:50 AM, Kalika Mishra wrote:
> 
>> Hi,
>> 
>> I would like to optimize solr core which is in Reader Writer mode.
>>> Since
>> the Solr cores are huge in size (above 100 GB) the optimization takes
> hours
>> to complete.
>> 
>> When the optimization is going on say. on the Writer core, the
> application
>> wants to continue using the indexes for both query and write
>> purposes.
> What
>> is the best approach to do this.
>> 
>> I was thinking of using a temporary index (empty core) to write the
>> documents and use the same Reader to read the documents. (Please note
> that
>> temp index and the Reader cannot be made Reader Writer as Reader is
> already
>> setup for the Writer on which optimization is taking place) But there
> could
>> be some updates to the temp index which I would like to get reflected
>>> in
>> the Reader. Whats the best setup to support this.
>> 
>> Thanks,
>> Kalika
> 
> - Mark Miller
> lucidimagination.com
>

Re: Solr 3.3 Sorting is not working for long fields

2011-11-14 Thread rajini maski

I

On Mon, Nov 14, 2011 at 7:23 PM, Ahmet Arslan  wrote:

> > When I do a basic sort on *long *field. the sort doesn't
> > happen.
> >
> >
> > Query is :
> >
> > -<
> http://blr-ws-195:8091/Solr3.3/select/?q=2%3A104+AND+526%3A27747&version=2.2&start=0&rows=10&indent=on&sort=469%20asc&fl=469#
> >
> > 
> >   0
> >   3
> >  -<
> http://blr-ws-195:8091/Solr3.3/select/?q=2%3A104+AND+526%3A27747&version=2.2&start=0&rows=10&indent=on&sort=469%20asc&fl=469#
> >
> > 
> >   studyid
> >   studyid asc
> >   on
> >   0
> >   *:*
> >   100
> >   2.2
> >  
> >  
> >
> >
> >
> >
> > 
> > - 
> > - 
> >   53
> >   
> > - 
> >   18
> >   
> > - 
> >   14
> >   
> > - 
> >   11
> >   
> > - 
> >   7
> >   
> > - 
> >   63
> >   
> > - 
> >   35
> >   
> > - 
> >   70
> >   
> > - 
> >   91
> >   
> > - 
> >   97
> >   
> >   
> >   
> >
> >
> > The same case works with Solr1.4.1 but it is not working
> > solr 3.3
>
> Can you try with the following type?
>
>   omitNorms="true" positionIncrementGap="0"/>
>
> And studyid must be marked as indexed="true".
>


I tried this one.   

It didn't work :(

Sort didn't happen

Re: Solr 3.3 Sorting is not working for long fields

2011-11-14 Thread Ahmet Arslan

> I tried this one.    name="tlong" class="solr.TrieLongField"
> precisionStep="8" omitNorms="true"
> positionIncrementGap="0"/>
> 
> It didn't work :(
> 
> Sort didn't happen


Did you restart tomcat and perform re-index?

XSLT caching mechanism

2011-11-14 Thread vrpar...@gmail.com

Hello All,

i am using xslt to transform solr xml response, when made search;getting
below warning

WARNING [org.apache.solr.util.xslt.TransformerProvider] The
TransformerProvider's simplistic XSLT caching mechanism is not appropriate
for high load scenarios, unless a single XSLT transform is used and
xsltCacheLifetimeSeconds is set to a sufficiently high value.

how can i apply effective xslt caching for solr ?



Thanks,
Vishal Parekh



--
View this message in context: 
http://lucene.472066.n3.nabble.com/XSLT-caching-mechanism-tp3506979p3506979.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: XSLT caching mechanism

2011-11-14 Thread Erik Hatcher

Set the cache lifetime high, like it says.

Questions - why use the XSLT response writer?  What are you transforming the 
response into and digesting it with?

Erik

On Nov 14, 2011, at 09:31 , vrpar...@gmail.com wrote:

> Hello All,
> 
> i am using xslt to transform solr xml response, when made search;getting
> below warning
> 
> WARNING [org.apache.solr.util.xslt.TransformerProvider] The
> TransformerProvider's simplistic XSLT caching mechanism is not appropriate
> for high load scenarios, unless a single XSLT transform is used and
> xsltCacheLifetimeSeconds is set to a sufficiently high value.
> 
> how can i apply effective xslt caching for solr ?
> 
> 
> 
> Thanks,
> Vishal Parekh
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/XSLT-caching-mechanism-tp3506979p3506979.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: delta-import of rich documents like word and pdf files!

2011-11-14 Thread Erick Erickson

And you cannot update-in-place. That is, you can't update
just selected fields in a document, you have to re-index the
whole document.

Best
Erick

On Mon, Nov 14, 2011 at 6:11 AM, Ahmet Arslan  wrote:
>
>> Thanks for your reply...my
>> data-config.xml is
>> 
>>         > type="BinFileDataSource" name="bin"/>
>> 
>>             > name="f" pk="id" processor="FileListEntityProcessor"
>> recursive="true"
>> rootEntity="false"
>>  dataSource="null"  baseDir="/var/data/solr"
>> fileName=".*\.(DOC)|(PDF)|(XML)|(xml)|(JPEG)|(jpg)|(ZIP)|(zip)|(pdf)|(doc)"
>> onError="skip"
>>
>> >
>>
>>             > name="tika-test" processor="TikaEntityProcessor"
>> url="${f.fileAbsolutePath}" format="text" dataSource="bin"
>> onError="skip">
>>
>> 
>>
>> 
>>
>> 
>>     
>> 
>>  > name="fileName"/>
>> 
>>         
>>         
>> 
>
> According to wiki : "the only EntityProcessor which supports delta is 
> SqlEntityProcessor."
>
> May be you can use newerThan parameter of FileListEntityProcessor. Issuing a 
> full-import with &clean=false may mimic delta import.
>
> You can pass value of this newerThan parameter in your request.
>
> command=full-import&clean=false&myLastModifiedParam=NOW-3DAYS
>
> http://wiki.apache.org/solr/DataImportHandler#Accessing_request_parameters
>
>
>

Re: Solr 3.3 Sorting is not working for long fields

2011-11-14 Thread rajini maski

Yes .

On 11/14/11, Ahmet Arslan  wrote:
>> I tried this one.   > name="tlong" class="solr.TrieLongField"
>> precisionStep="8" omitNorms="true"
>> positionIncrementGap="0"/>
>>
>> It didn't work :(
>>
>> Sort didn't happen
>
>
> Did you restart tomcat and perform re-index?
>

Re: Solr 3.3 Sorting is not working for long fields

2011-11-14 Thread Ahmet Arslan

> Yes .


> > Did you restart tomcat and perform re-index?
> >
> 

Okey, one thing left. Http caching may cause stale response. Delete your 
browsers cache if you are using a browser to query solr.

Re: XSLT caching mechanism

2011-11-14 Thread Chantal Ackermann

In solrconfig.xml, change the xsltCacheLifetimeSeconds property of the
XSLTResponseWriter to the desired value (this example 6000secs):


6000




On Mon, 2011-11-14 at 15:31 +0100, vrpar...@gmail.com wrote:
> Hello All,
> 
> i am using xslt to transform solr xml response, when made search;getting
> below warning
> 
> WARNING [org.apache.solr.util.xslt.TransformerProvider] The
> TransformerProvider's simplistic XSLT caching mechanism is not appropriate
> for high load scenarios, unless a single XSLT transform is used and
> xsltCacheLifetimeSeconds is set to a sufficiently high value.
> 
> how can i apply effective xslt caching for solr ?
> 
> 
> 
> Thanks,
> Vishal Parekh
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/XSLT-caching-mechanism-tp3506979p3506979.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Easy way to tell if there are pending documents

2011-11-14 Thread Latter, Antoine

Hi Solr,

Does anyone know of an easy way to tell if there are pending documents waiting 
for commit?

Our application performs operations that are never safe to perform while 
commits are pending. We make this work by making sure that all indexing 
operations end in a commit, and stop the unsafe operations from running while a 
commit is running.

This works great most of the time, except when we have enough disk space to add 
documents to the pending area, but not enough disk space to do a commit - then 
the indexing operations only error out after they've done all of their adds.

It would be nice if the unsafe operation could somehow detect that there are 
pending documents and abort.

In the interim I'll have the unsafe operation perform a commit when it starts, 
but I've been weeding out useless commits from my app recently and I don't like 
them creeping back in.

Thanks,
Antoine

get a total count

2011-11-14 Thread U Anonym

Hello everyone,

A newbie question:  how do I find out how documents have been indexed
across all shards?

Thanks much!

memory usage keep increase

2011-11-14 Thread Yongtao Liu

Hi all,

I saw one issue is ram usage keep increase when we run query.
After look in the code, looks like Lucene use MMapDirectory to map index file 
to ram.

According to 
http://lucene.apache.org/java/3_1_0/api/core/org/apache/lucene/store/MMapDirectory.html
 comments, it will use lot of memory.
NOTE: memory mapping uses up a portion of the virtual memory address space in 
your process equal to the size of the file being mapped. Before using this 
class, be sure your have plenty of virtual address space, e.g. by using a 64 
bit JRE, or a 32 bit JRE with indexes that are guaranteed to fit within the 
address space.

So, my understanding is solr request physical RAM >= index file size, is it 
right?

Yongtao


**Legal Disclaimer***
"This communication may contain confidential and privileged
material for the sole use of the intended recipient. Any
unauthorized review, use or distribution by others is strictly
prohibited. If you have received the message in error, please
advise the sender by reply email and delete the message. Thank
you."
*

Re: TREC-style IR experiments

2011-11-14 Thread Ismo Raitanen

>> I'm planning to do some information retrieval experiments with Solr.

> There some existing implementations in Lucene
> http://lucene.apache.org/java/3_0_2/api/contrib-benchmark/org/apache/lucene/benchmark/quality/trec/package-summary.html

Have you used that with Solr? How?

//Ismo

Help! - ContentStreamUpdateRequest

2011-11-14 Thread Tod


Could someone take a look at this page:

http://wiki.apache.org/solr/ContentStreamUpdateRequestExample

... and tell me what code changes I would need to make to be able to 
stream a LOT of files at once rather than just one?  It has to be 
something simple like a collection of some sort but I just can't get it 
figured out.  Maybe I'm using the wrong class altogether?



TIA

Re: Question about solr caches and warming

2011-11-14 Thread Chris Hostetter


: Although I don't have statistics to back my claim, I suspect that the really
: nasty filters don't have as high a hitcount as the ones that are more simple.
: Typically the really nasty filters are used when an employee logs into the
: site.  Employees have access to a lot more than customers do, but the search
: still needs to be filtered to be appropriate for whatever search options are
: active.

A low impact change to consider would be to leverage the "cache=false" 
local param feature that was added in Solr 3.4...

  https://wiki.apache.org/solr/CommonQueryParameters#Caching_of_filters

...you could add this localparam anytime you know the query is coming from 
an employee -- or anytime you know the filter query is "esoteric"

A higher impact change would be to create a dedicated query slave 
machine (or just an alternate core name that polls the same master) that 
is *only* used by employees and has much lower sizes on the caches -- this 
is the approach i have advocated and seen work very well since the 
pre-apache days of Solr: dedicated instances for each major "user base" 
with key settings (ie: replication frequencies, cache sizes, cache 
warming, static warming of sorts, etc...) tuned for that user base.  

-Hoss

Getting 411 Length required when adding docs

2011-11-14 Thread Darniz

Hello All, 
i am this strange issue of http 411 Length required error. My Solr is hosted
on third party hosting company and it was working fine all these while. 
i really don't understand why this happened. Attached is the stack trace any
help will be appreciated

org.apache.solr.common.SolrException: Length Required
Length Required

request: http://www.listing-social.com/solr/update?wt=javabin&version=1
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:424)
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243)
at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:64)
at org.apache.solr.client.solrj.SolrServer.addBean(SolrServer.java:68)
at
com.listings.solr.service.impl.BulkIndexingServiceImpl.startBulkIndexing(BulkIndexingServiceImpl.java:55)
at
com.listings.action.BulkIndexingAction.execute(BulkIndexingAction.java:42)
at
org.apache.struts.chain.commands.servlet.ExecuteAction.execute(ExecuteAction.java:53)
at
org.apache.struts.chain.commands.AbstractExecuteAction.execute(AbstractExecuteAction.java:64)
at
org.apache.struts.chain.commands.ActionCommandBase.execute(ActionCommandBase.java:48)
at org.apache.commons.chain.impl.ChainBase.execute(ChainBase.java:190)
at
org.apache.commons.chain.generic.LookupCommand.execute(LookupCommand.java:304)
at org.apache.commons.chain.impl.ChainBase.execute(ChainBase.java:190)
at
org.apache.struts.chain.ComposableRequestProcessor.process(ComposableRequestProcessor.java:280)
at 
org.apache.struts.action.ActionServlet.process(ActionServlet.java:1858)
at org.apache.struts.action.ActionServlet.doGet(ActionServlet.java:446)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at 
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:487)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:362)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)

Thanks

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Getting-411-Length-required-when-adding-docs-tp3508372p3508372.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Keyword counts

2011-11-14 Thread Chris Hostetter


: Thanks for the reply. There are many keyword terms (<1000?) and not sure if
: Solr would choke on a query string that long. Perhaps solr is not built to

Did you try it?

1000 facet.query params is not a strain for Solr -- but you may find 
problems with your servlet container if you try specifying them all in a 
GET request.

if this list isn't going to change very often it sounds like a perfect use 
case for specifying as "appends" request params on the request 
handler declaration in your solrconfig.xml

see the comments in solrconfig.xml for examples.

-Hoss

Index format difference between 4.0 and 3.4

2011-11-14 Thread roz dev

Hi All,

We are using Solr 1.4.1 in production and are considering an upgrade to
newer version.

It seems that Solr 3.x requires a complete rebuild of index as the format
seems to have changed.

Is Solr 4.0 index file format compatible with Solr 3.x format?

Please advise.

Thanks
Saroj

File based wordlists for spellchecker

2011-11-14 Thread Tomasz Wegrzanowski

Hi,

I have a very large index, and I'm trying to add a spell checker for it.
I don't want to copy all text in index to extra spell field, since that would
be prohibitively big, and index is already close to how big it can
reasonably be,
so I just want to extract word frequencies as I index for offline processing.

After some filtering I get something like this (word, frequency):

a   122958495
aa  834203
aaa 175206
22389
aaab1522
aaai1050
aaas6384
aab 8109
aabb1906
aac 35100
aacc1692
aachen  11723

I wanted to use FileBasedSpellChecker, but it doesn't support frequencies,
so its recommendations are consistently horrible. Increasing frequency cutoff
won't really help that much - it will still suggest less frequent
words over equally
similar more frequent words.

What's the easiest way to get this working?
Presumably I'd need to create a separate index with just these words.
How do I get frequencies there, without actually creating 11723 records with
"aachen" in them etc.?

I can do some small Java coding if need be.
I'm already using 3.x branch (mostly for edismax, plus some unrelated
minor patches).

Thanks,
Tomasz

Re: Casesensitive search problem

2011-11-14 Thread jsahoo1...@gmail.com

HI,
Even if i have used all the posibility way like  still i am getting same problrm.If
anyone faced  before same problem  please let me know how you have solved.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Casesensitive-search-problem-tp3506883p3508765.html
Sent from the Solr - User mailing list archive at Nabble.com.

39 matches

Mail list logo