Re: Solr Not Searching while INDEXING the DATA

2014-05-05 Thread Sohan Kalsariya
Thanks a lot Shawn for the help!
we have given dedicated server to the solr and the RAM size is 650 MB.
​This didn't​ happen when I was doing it locally.
I have seen the same problem in sphinx framework but ​it was solved using
some feature called as "rotate",
and we were able to search the QUERY while INDEXING.



On Mon, May 5, 2014 at 8:59 PM, Shawn Heisey  wrote:

> On 5/5/2014 5:39 AM, Sohan Kalsariya wrote:
> > I am not able to search for the data while indexing.
> > Indexing is done via the dataimport handler.
> > While searching for the documents (in between indexing is happening), it
> > gives the broken pipe exception and wont search anything.
> > What should be the proper solution for this problem?
>
> A broken pipe exception means that your client gave up and timed out
> before Solr could respond, so it closed the TCP connection.  When Solr
> finally was able to respond, the connection was gone, so the servlet
> container logged that exception.
>
> The most common reason for underlying performance issues that causes
> problems like this is that you don't have enough RAM.  It could be
> something else, of course.  A number of possible options are covered on
> this wiki page:
>
> http://wiki.apache.org/solr/SolrPerformanceProblems
>
> I see that you asked the same question on the IRC channel early this
> morning (in my timezone), but you were gone before I was awake to see that.
>
> Thanks,
> Shawn
>
>


-- 
Regards,
*Sohan Kalsariya*


Re: Histogram facet?

2014-05-05 Thread Romain Rigaux
The dates won't match unless you truncate all of them to day. But then if
you want to have slots of 15minutes it won't work as you would need to
truncate the dates every 15minutes in the index.

In ES, they have 1 field to make the slots and 1 field to insert into the
bucket, e.g.:

{
"query" : {


"match_all" : {}


},
"facets" : {


"histo1" : {


"date_histogram" : {


"key_field" : "timestamp",


"value_field" : "price",


"interval" : "day"


}
}


}
}

Romain


On Mon, May 5, 2014 at 9:05 PM, Erick Erickson wrote:

> Hmmm, I _think_ pivot faceting works here. One dimension would be day
> and the other retweet count. The response will have the number of
> retweets per day, you'd have to sum them up I suppose.
>
> Best,
> Erick
>
> On Mon, May 5, 2014 at 3:18 PM, Romain  wrote:
> > Hi,
> >
> > I am trying to plot a non date field by time in order to draw an
> histogram
> > showing its evolution during the week.
> >
> > For example, if I have a tweet index:
> >
> > Tweet:
> >   date
> >   retweetCount
> >
> > 3 tweets indexed:
> > Tweet | Date | Retweet
> > A01/01   100
> > B01/01   100
> > C01/02   100
> >
> > If I want to plot the number of tweets by day: easy with a date range
> facet:
> > Day 1: 2
> > Day 2: 1
> >
> > But now counting the number of retweet by day is not possible natively:
> > Day 1: 200
> > Day 2: 100
> >
> > On current workaround would be to do a date rage facet to get the date
> > slots and ask only for the retweet field and compute the sums in the
> > client. We could compute other stats like average, etc... too
> >
> > The closest I could see was
> > https://issues.apache.org/jira/browse/SOLR-4772but it seems to be
> > slightly different.
> >
> > Basically I am trying to do something very similar to the Date Histogram
> > Facet<
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-facets-date-histogram-facet.html#search-facets-date-histogram-facet
> >in
> > ES.
> >
> > Is there a way to move the counting logic to the Solr server?
> >
> > Thanks!
> >
> > Romain
>


Re: Indexing scanned PDFs

2014-05-05 Thread Alexandre Rafalovitch
Nothing I am aware of for Solr directly. You may have better luck
chasing this at TIKA mailing list, as that's what Solr uses under
covers to index PDF otherwise. Doing a quick search for Tika and OCR
brings up a number of links.

Regards,
  Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Tue, May 6, 2014 at 12:15 PM, Chandan Tamrakar
 wrote:
> we are using SOLr to index pdf documents but there are cases where PDFs
> are usually a scanned document  with no text to extract and index .
>
> Is there a plugin or module in SOLR that we can integrate so that it would
> actually extract a text / OCR and then index?
>
>
> Thanks in advance
>
> Chandan Tamrakar


Re: sort groups by the sum of the scores of the documents within each group

2014-05-05 Thread Frankcis
thank you, Erick, you're good man,
this is the client requirement:
In the forum, there is a lot of discussion of the content under different
subjects, search for a keyword,
which will lead to a result that the word of content or subject match the
query, group these document based on every subject, sort these groups based
on the sum score of every subject.

my pleasure to listen your suggestions.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Re-sort-groups-by-the-sum-of-the-scores-of-the-documents-within-each-group-tp4134715p4134869.html
Sent from the Solr - User mailing list archive at Nabble.com.


Indexing scanned PDFs

2014-05-05 Thread Chandan Tamrakar
​we are using SOLr to index pdf documents but there are cases where PDFs
are usually a scanned document  with no text to extract and index .

Is there a plugin or module in SOLR that we can integrate so that it would
actually extract a text / OCR and then index?


Thanks in advance

Chandan Tamrakar


Re: sort groups by the sum of the scores of the documents within each group

2014-05-05 Thread Erick Erickson
Frankly, I really don't know how to make that happen. I took a quick
look at the function query stuff (I don't have them all memorized yet)
and I just can't seem to make them bend that way.

I can imagine  writing custom code to make it work but I don't really
know how much effort would be involved. I suspect it would not be
trivial.

What I'd do is go back to the client and ask _them_ why it would be
useful. Along with some estimate for figuring out what was necessary
and let _them_ figure out whether it was worth it. Say a week's worth
of effort to scope the work involved. From my viewpoint, given that
the use of this feature is questionable at best, it's a service to the
client to force them to lay out a clear use-case for this capability
and also give them some kind of cost (in this case, just the cost to
figure out _how_ to do it, not actually do it).

Then they can make a rational decision whether the functionality is
worth it. One outcome for them is to say "yes, our use case is
compelling enough we're willing to pay you to figure out how to make
it happen". Another outcome is for them to say "Oh, if it's not OOB
functionality, it's not worth much effort". Yet a third response is
"You're right, that makes no sense whatsoever, don't bother".

Until and unless you give them the feedback that this is not OOB
functionality, and get them to explain why they think it's valuable
and let them know that it'll likely cost a significant amount, you're
not giving them the information to make a rational decision.

I've just seen way too many features implemented in various projects
that wind up taking a lot of effort without being useful...

There, rant finished.

Best,
Erick

On Mon, May 5, 2014 at 9:37 PM, Frankcis  wrote:
> thank you, Erick, you're right, the maxScore of document within each group is
> more effective than the sum of scores in a group, especially some use-case
> just as your assumption(group 1 could have 10M documents all with a score of
> .01 and group 2 could have 1 document with a score of 1,000 and group 1
> would sort
> first) ,but the function is required by the client, can you tell me the way
> how to achieve it ?
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Re-sort-groups-by-the-sum-of-the-scores-of-the-documents-within-each-group-tp4134715p4134856.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: sort groups by the sum of the scores of the documents within each group

2014-05-05 Thread Frankcis
thank you, Erick, you're right, the maxScore of document within each group is
more effective than the sum of scores in a group, especially some use-case
just as your assumption(group 1 could have 10M documents all with a score of
.01 and group 2 could have 1 document with a score of 1,000 and group 1
would sort 
first) ,but the function is required by the client, can you tell me the way
how to achieve it ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Re-sort-groups-by-the-sum-of-the-scores-of-the-documents-within-each-group-tp4134715p4134856.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: sort groups by the sum of the scores of the documents within each group

2014-05-05 Thread Erick Erickson
You haven't answered _why_ this is a good idea. I'm having a hard
time understanding what would be _useful_ about sorting this way. Just
because the sum of scores in a group is greater than the sum of scores
in another says _nothing_ about how relevant any of the docs in the group
are relative to each other.

I mean group 1 could have 10M documents all with a score of .01 and group
2 could have 1 document with a score of 1,000 and group 1 would sort
first.

So unless you have some unusual use-case which you haven't yet articulated,
this seems like a bad idea.

Best,
Erick

On Mon, May 5, 2014 at 7:20 PM, Frankcis  wrote:
> my scheme.xml:
> 
>   
> omitNorms="true"/>
> positionIncrementGap="0"/>
>
> positionIncrementGap="100" omitNorms="false"
> autoGeneratePhraseQueries="false">
>
>  class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory"
> mode="complex" dicPath="E:\solr-4.6.1\example\solr\dict"/>
>  words="stopwords.txt"/>
>  synonyms="synonyms.txt"
> ignoreCase="false" expand="true"/>
> 
> 
>  class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory"
> mode="complex" dicPath="E:\solr-4.6.1\example\solr\dict"/>
>  words="stopwords.txt"/>
>  synonyms="synonyms.txt"
> ignoreCase="false" expand="true"/>
> 
>   
>   
>
>  
>indexed="true"  stored="true"
> multiValued="false" required="true" />
>stored="true"  multiValued="false" />
>multiValued="false" />
>stored="true" />
>
>   
>  
>
>  id
>
>
>  name
>
>
>  
> 
>
> update docs:
> "docs": [
>   {
> "name": "苹果4s",
> "type": "手机",
> "price": 2000,
> "id": "4017e35a-6b19-45b6-b945-382340ca1eec",
> "_version_": 1466799722505175000
>   },
>   {
> "name": "苹果5",
> "type": "手机",
> "price": 5000,
> "id": "4052d9f3-f6d9-458f-8bb0-477b17852f37",
> "_version_": 1466799735745544200
>   },
>   {
> "name": "三星",
> "type": "手机",
> "price": 3000,
> "id": "468abce8-8bb9-4f51-9900-8d4d6abc02ac",
> "_version_": 1466799747596550100
>   },
>   {
> "name": "摩托罗拉i3",
> "type": "电脑",
> "price": 1000,
> "id": "db66bb02-3d6a-4ab0-9133-2e6e38b3d4dd",
> "_version_": 1466799757491961900
>   },
>   {
> "name": "摩托罗拉i5",
> "type": "电脑",
> "price": 1500,
> "id": "f211525f-bc3c-4ea7-aded-1c46a94ecd1c",
> "_version_": 1466799766311534600
>   }
> ]
> thank you , Erick,
> i want to sort groups based on the sum of documents' scores within each
> group, as you said, solr excels at getting the score of single documents, in
> solr 4.6, the default sort of group each other depends on the maxScore of
> all documents within each group, but the sum of documents' scores, though i
> can get the sum of documents' scores by the client program, it's not good
> idea, l know that the stats component of solr can statistics the long field,
> so I had the idea to use statistic data for score field, but the score is
> pse-udo field, the stats.field doesn't support it. In addition, as
> scheme.xml displayed,  i do group on the elements of a string field(type)
> without using participle.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Re-sort-groups-by-the-sum-of-the-scores-of-the-documents-within-each-group-tp4134715p4134830.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Wildcard malfunctioning

2014-05-05 Thread Alexandre Rafalovitch
I mark all the filters that support wildcards with (multi) on my list:
http://www.solr-start.com/info/analyzers/ . I uses actual interface
markers to derive that list, so it should be most up to date.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Mon, May 5, 2014 at 6:19 PM, Jack Krupansky  wrote:
> Generally, stemming filters are not supported when wildcards are present.
> Only a small subset of filters work with wildcards, such as the case
> conversion filters.
>
> But, you stay that you are using the stemmer to remove diacritical marks...
> you can/should use ASCIIFoldingFilterFactory or MappingCharFilterFactory.
>
> -- Jack Krupansky
>
> -Original Message- From: Román González
> Sent: Monday, May 5, 2014 7:00 AM
> To: solr-user@lucene.apache.org
> Subject: Wildcard malfunctioning
>
>
> Hi all!
>
>
>
> Sorry in advance if this question was posted but I were unable to find it
> with search engines.
>
>
>
> Filter SpanishLightStemFilterFactory is not working properly with wildcards
> or I’m misunderstanding something. I have the field
>
>
>
>   
>
>
>
> With this type:
>
>
>
> positionIncrementGap="100">
>
>  
>
>
>
>
>
> words="lang/stopwords_es.txt" format="snowball" />
>
>
>
>
>
>  
>
>
>
>
>
> But I’m getting these results:
>
>
>
> q = cultivo_es:uva
>
> Getting 50 correct results
>
>
>
> q = cultivo_es:uva*
>
> Getting the same 50 correct results
>
>
>
> q = cultivo_es:naranja
>
> Getting the 50 correct results of “naranja”
>
>
>
> q = cultivo_es:naranja*
>
> Getting the 0 results !
>
>
>
> It works fine if I remove SpanishLightStemFilterFactory filter, but I need
> it in order to filter diacritics according to Spanish rules.
>
>
>
> Thank you!!
>
>
>


Re: Histogram facet?

2014-05-05 Thread Erick Erickson
Hmmm, I _think_ pivot faceting works here. One dimension would be day
and the other retweet count. The response will have the number of
retweets per day, you'd have to sum them up I suppose.

Best,
Erick

On Mon, May 5, 2014 at 3:18 PM, Romain  wrote:
> Hi,
>
> I am trying to plot a non date field by time in order to draw an histogram
> showing its evolution during the week.
>
> For example, if I have a tweet index:
>
> Tweet:
>   date
>   retweetCount
>
> 3 tweets indexed:
> Tweet | Date | Retweet
> A01/01   100
> B01/01   100
> C01/02   100
>
> If I want to plot the number of tweets by day: easy with a date range facet:
> Day 1: 2
> Day 2: 1
>
> But now counting the number of retweet by day is not possible natively:
> Day 1: 200
> Day 2: 100
>
> On current workaround would be to do a date rage facet to get the date
> slots and ask only for the retweet field and compute the sums in the
> client. We could compute other stats like average, etc... too
>
> The closest I could see was
> https://issues.apache.org/jira/browse/SOLR-4772but it seems to be
> slightly different.
>
> Basically I am trying to do something very similar to the Date Histogram
> Facetin
> ES.
>
> Is there a way to move the counting logic to the Solr server?
>
> Thanks!
>
> Romain


Re: Relevancy help

2014-05-05 Thread Alexandre Rafalovitch
Can you sort by score, than date? Assuming similar articles will get
same score (may need to discount frequency/length).

There is also QueryRescore API introduced in Lucene 4.8 that might be
relevant. Though I have no idea how that would get exposed in Solr.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Tue, May 6, 2014 at 5:12 AM, Ahmet Arslan  wrote:
> Hi Ravi,
>
> Regarding recency please see : 
> http://www.slideshare.net/lucenerevolution/potter-timothy-boosting-documents-in-solr
>
> Regarding "docs containing all words" there is function query that elevates 
> those docs to top. Search existing mailing list past posts.
>
> Ahmet
>
>
> On Tuesday, May 6, 2014 12:42 AM, Ravi Solr  wrote:
>
> Hello,
> I have a weird relevancy requirement. We search news content hence
> chronology is very important and also relevancy, although both are mutually
> exclusive. For example, if the search terms are -  malaysia airline crash
> blackbox - my requirements are as follows
>
> docs containing all words should be on top, but the editorial also wants
> them sorted reverse by chronological order without loosing relevancy. Why
> ?? If on day 1 there is an article about search for blackbox but on day 2
> the blackbox is found and day 3 there is an article about blackbox being
> unusable...from the user's standpoint it makes sense that we show most
> recent content on top.
>
> I already boost recency of docs with
> boost=recip(ms(NOW/HOUR,displaydatetime),7.889e-10,1,1) i.e. increments of
> 3 months
>
> However when I do the boost the chronology is messed up. I know relevancy
> and sorting are mutually exclusive concepts. Is there any magic that we can
> do in SOLR which can achieve both ???
>
>
> Thanks,
>
> Ravi Kiran bhaskar


Re: Strict Search in Apache Solr

2014-05-05 Thread Alexandre Rafalovitch
You can do phrase search explicitly with quotes. Or you could look at
something like Term query parser:
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-TermQueryParser

You can also enable autoGeneratePhraseQueries on the field type to try
the phrase queries, but that's in addition to trying individual terms:
https://cwiki.apache.org/confluence/display/solr/Field+Type+Definitions+and+Properties

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Tue, May 6, 2014 at 5:35 AM, Jack Krupansky  wrote:
> The term "strict search" is not in the Lucene/Solr nomenclature - it could
> mean any number of things.
>
> It sounds as if maybe you want to do a phrase search, looking for an exact
> phrase - yes, you can do that by enclosing the phrase in quotes.
>
> -- Jack Krupansky
>
> -Original Message- From: Reyes, Mark
> Sent: Monday, May 5, 2014 5:23 PM
> To: solr-user@lucene.apache.org
> Subject: Strict Search in Apache Solr
>
>
> How could Solr accomplish an end-user behavior like a strict search?
>
> Let’s say an end-user decides to use quotation marks in their keywords to
> provide specificity in their search results.
>
> Current:
> If you were to query: your future, then 10 results would return and print to
> the page.
>
> Expected:
> I’d like to query: “your future”, then less than 10 results would return and
> print to the page.
>
> Regards,
> Mark
>
> IMPORTANT NOTICE: This e-mail message is intended to be received only by
> persons entitled to receive the confidential information it may contain.
> E-mail messages sent from Bridgepoint Education may contain information that
> is confidential and may be legally privileged. Please do not read, copy,
> forward or store this message unless you are an intended recipient of it. If
> you received this transmission in error, please notify the sender by reply
> e-mail and delete the message and any attachments.


Re: Solr does not recognize language

2014-05-05 Thread Frankcis
hi,iorixxx, i'm Frankcis, not Victor , are you make the wrong email?


2014-05-05 23:20 GMT+08:00 iorixxx [via Lucene] <
ml-node+s472066n4134713...@n3.nabble.com>:

> Hi Victor,
>
> I don't know mysolr, I assume you are using /update/json, lets add your
> chain to defaults section.
>
>   
>
> 
>  application/json
>  langid
>
>   
>
>
>
>
> On Monday, May 5, 2014 4:06 PM, Victor Pascual <[hidden 
> email]>
> wrote:
> Hi there,
>
> I'm indexing my documents using mysolr. I mainly generate a lost of json
> objects and the run: solr.update(documents_array,'json')
>
>
>
> On Mon, May 5, 2014 at 1:08 PM, Ahmet Arslan <[hidden 
> email]>
> wrote:
>
> > Hi Victor,
> >
> > How do you index your documents? Your last config looks correct. However
> > for example if you use data import handler you need to add update.chain
> > there too. Same as extraction request hadler if you are using sole-cell.
> >
> >  > class="org.apache.solr.handler.dataimport.DataImportHandler">
> > 
> >   /home/username/data-config.xml
> >   langid
> > 
> >   
> >
> > By the way The URL
> > http://localhost:8080/solr/update?commit=true&update.chain=langid was
> > just an example and meant to feed xml update messages by POST method.
> Not
> > to use in a browser.
> >
> > Ahmet
> >
> > On Monday, May 5, 2014 11:04 AM, Victor Pascual <
> > [hidden email] >
> wrote:
> >
> > Thank you very much for you help Ahmet.
> >
> > However the language detection is still not workin. :(
> > My solrconfig.xml didn't contain that lst section inside the update
> > requestHandler.
> > That's the content I added:
> >
> >> >  class="solr.XmlUpdateRequestHandler">
> > >   
> > > langid
> > >   
> > >
> > >
> >
> >
> > >>
> class="org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory">
>
> > >  
> > >text
> > >lang
> > >  
> > >
> > >
> > >   
> > > 
> >
> > Now, your suggested query
> > http://localhost:8080/solr/update?commit=true&update.chain=langid returns
>
> >
> > 
> > >
> > >0
> > >14
> > >
> > >
> > And there is still no lang field in my documents.
> > Any idea what am I doing wrong?
> >
> >
> >
> >
> > On Tue, Apr 29, 2014 at 5:33 PM, Ahmet Arslan <[hidden 
> > email]>
> wrote:
> >
> > Hi,
> > >
> > >solr/update should be used, not /solr/select
> > >
> > >curl '
> http://localhost:8983/solr/update?commit=true&update.chain=langid'
> > >
> > >By the way don't you have following definition in your solrconfig.xml?
> > >
> > > 
> > >
> > >   
> > > langid
> > >   
> > >  
> > >
> > >
> > >
> > >
> > >On Tuesday, April 29, 2014 4:50 PM, Victor Pascual <
> > [hidden email] >
> wrote:
> > >Hi Ahmet,
> > >
> > >thanks for your reply. Adding &update.chain=langid to my query doesn't
> > >work: IP:8080/solr/select/?q=*%3A*&update.chain=langid
> > >Regarding defining the chain in an UpdateRequestHandler... sorry for
> the
> > >lame question but shall I paste those three lines to solrconfig.xml, or
> > >shall I add them somewhere else?
> > >
> > >There is not UpdateRequestHandler in my solrconfig.
> > >
> > >Thanks!
> > >
> > >
> > >
> > >On Tue, Apr 29, 2014 at 3:13 PM, Ahmet Arslan <[hidden 
> > >email]>
> wrote:
> > >
> > >> Hi,
> > >>
> > >> Did you attach your chain to a UpdateRequestHandler?
> > >>
> > >> You can do it by adding &update.chain=langid to the URL or defining
> it
> > in
> > >> a defaults section as follows
> > >>
> > >> 
> > >>  langid
> > >>
> > >>
> > >>
> > >>
> > >> On Tuesday, April 29, 2014 3:18 PM, Victor Pascual <
> > >> [hidden email] >
> wrote:
> > >> Dear all,
> > >>
> > >> I'm a new user of Solr. I've managed to index a bunch of documents
> (in
> > >> fact, they are tweets) and everything works quite smoothly.
> > >>
> > >> Nevertheless it looks like Solr doesn't detect the language of my
> > documents
> > >> nor remove stopwords accordingly so I can extract the most frequent
> > terms.
> > >>
> > >> I've added this piece of XML to my solrconfig.xml as well as the Tika
> > lib
> > >> jars.
> > >>
> > >> 
> > >> > >>
> > >>
> >
> class="org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory">
>
> > >>   
> > >> text
> > >> lang
> > >>   
> > >> 
> > >> 
> > >>
> > >>  
> > >>
> > >> There is no error in the tomcat log file, so I have no clue of why
> this
> > >> isn't working.
> > >> Any hint on how to solve this problem will be much appreciated!
> >

Re: Linking Two Fields Together

2014-05-05 Thread Alexandre Rafalovitch
You can have two parallel multi-value fields and as long as you don't
introduce null/empty values, they will kept together. However, for
recent Solr (4.7? certainly 4.8), you may want to look at parent/child
entries and join/parent/child queries.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Tue, May 6, 2014 at 7:20 AM, Steve Edwards  wrote:
> I'm using Sorl to create an image search functionality that allows users to 
> search for an existing image in the site to add to new content.  A given 
> piece of content has a field that can store multiple images, so I will need 
> to use a multi-value Solr field to store image data. Currently, I'm storing 
> the path and file name in a tom_* field, since I want to be able to search on 
> file name. However, another piece of data that I need to store and retrieve 
> is the file id used to identify the file in the database (in the same table 
> as the image path). What is the best way to store this data so that the file 
> id and path values are properly synced, since there can be multiple images 
> for each piece of content?  I could just store the file path/name (I need 
> that data to be searchable, so it has to be stored in Solr), and then query 
> the db for the fid once I get the results back, but I'd rather not do that if 
> I don't have to.
>
> Searching around, it doesn't appear that I can store multiple pieces of data 
> in one field without doing some sort of concatenation and then splitting at 
> query time.  If I just use two separate fields in each document, is it safe 
> to assume that the values will be synchronized in the search results? In 
> other words, if I put two values each into tom_image_path and 
> im_image_file_id, when I query and the document is returned, can I assume the 
> values in the two fields are synchronized?
>
> Or, is there a way to store multiple pieces of data in one field so that they 
> can be indexed together and then retrived together?
>
> Thanks.
>
> Steve


Re: Help to Understand a Solr Query

2014-05-05 Thread Alexandre Rafalovitch
If you are looking for that level of understanding, you are best
enabling the debug flag. Then you will get a full breakdown of what
matched which field and why. Including scores, preferences, etc.
Possibly with debug.explained.structured enabled:
http://wiki.apache.org/solr/CommonQueryParameters#debug.explain.structured

Most people do not want to deep dive into debug info. But I am getting
the feeling this would be right where you want to go.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Tue, May 6, 2014 at 1:47 AM, nativecoder  wrote:
> That answer helps a lot
>
> Where would the OR clause be ?
>
> (Exact_Field1:samplestring1 OR Exact_Field1:samplestring2) AND
> (Exact_Field2:samplestring1 OR Exact_Field2:samplestring2) AND
> (Field1:samplestring1 OR Field1:samplestring2) AND (Field2:samplestring1
> OR Field2:samplestring2)
>
> Please note that in my query it is an AND clause. I am trying to understand
> where the AND fits in. To be more precise my query is as below
>
> q=samplestring1 AND samplestring2&defType: edismax&qf: Exact_Field1^1.0
> Exact_Field2^0.9 Field1^0.8 Field2^0.7&fl= Column1, Column2
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Help-to-Understand-a-Solr-Query-tp4134686p4134775.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Anybody uses Solr JMX?

2014-05-05 Thread Alexandre Rafalovitch
Thanks Otis,

JMXC looks interesting, though I cannot seem to find the "Open Source"
section on your website it used to link to.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Tue, May 6, 2014 at 9:43 AM, Otis Gospodnetic
 wrote:
> Alexandre, you could use something like
> http://blog.sematext.com/2012/09/25/new-tool-jmxc-jmx-console/ to quickly
> dump everything out of JMX and see if there is anything there Solr Admin UI
> doesn't expose.  I think you'll find there is more in JMX than Solr Admin
> UI shows.
>
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
>
>
> On Mon, May 5, 2014 at 1:56 AM, Alexandre Rafalovitch 
> wrote:
>
>> Thank you everybody for the links and explanations.
>>
>> I am still curious whether JMX exposes more details than the Admin UI?
>> I am thinking of a troubleshooting context, rather than long-term
>> monitoring one.
>>
>> Regards,
>>Alex.
>> Personal website: http://www.outerthoughts.com/
>> Current project: http://www.solr-start.com/ - Accelerating your Solr
>> proficiency
>>
>>
>> On Mon, May 5, 2014 at 12:21 PM, Gora Mohanty  wrote:
>> > On May 5, 2014 7:09 AM, "Alexandre Rafalovitch" 
>> wrote:
>> >>
>> >> I have religiously kept  statement in my solrconfig.xml, thinking
>> >> it was enabling the web interface statistics output.
>> >>
>> >> But looking at the server logs really closely, I can see that JMX is
>> >> actually disabled without server present. And the Admin UI does not
>> >> actually seem to care after a quick test.
>> >>
>> >> Does anybody have a real experience with Solr JMX? Does it expose more
>> >> information than Admin UI's Plugins/Stats page? Is it good for
>> >>
>> >
>> > Have not been using JMX lately, but we were using it in the past. It does
>> > allow monitoring many useful details. As others have commented, it also
>> > integrates well with other monitoring  tools as JMX is a standard.
>> >
>> > Regards,
>> > Gora
>>


Re: Anybody uses Solr JMX?

2014-05-05 Thread Otis Gospodnetic
Alexandre, you could use something like
http://blog.sematext.com/2012/09/25/new-tool-jmxc-jmx-console/ to quickly
dump everything out of JMX and see if there is anything there Solr Admin UI
doesn't expose.  I think you'll find there is more in JMX than Solr Admin
UI shows.

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Mon, May 5, 2014 at 1:56 AM, Alexandre Rafalovitch wrote:

> Thank you everybody for the links and explanations.
>
> I am still curious whether JMX exposes more details than the Admin UI?
> I am thinking of a troubleshooting context, rather than long-term
> monitoring one.
>
> Regards,
>Alex.
> Personal website: http://www.outerthoughts.com/
> Current project: http://www.solr-start.com/ - Accelerating your Solr
> proficiency
>
>
> On Mon, May 5, 2014 at 12:21 PM, Gora Mohanty  wrote:
> > On May 5, 2014 7:09 AM, "Alexandre Rafalovitch" 
> wrote:
> >>
> >> I have religiously kept  statement in my solrconfig.xml, thinking
> >> it was enabling the web interface statistics output.
> >>
> >> But looking at the server logs really closely, I can see that JMX is
> >> actually disabled without server present. And the Admin UI does not
> >> actually seem to care after a quick test.
> >>
> >> Does anybody have a real experience with Solr JMX? Does it expose more
> >> information than Admin UI's Plugins/Stats page? Is it good for
> >>
> >
> > Have not been using JMX lately, but we were using it in the past. It does
> > allow monitoring many useful details. As others have commented, it also
> > integrates well with other monitoring  tools as JMX is a standard.
> >
> > Regards,
> > Gora
>


Re: sort groups by the sum of the scores of the documents within each group

2014-05-05 Thread Frankcis
my scheme.xml:

  
   
   
   
   
   









  
  
  

  
  
  
  
 
  
 
 
 id

 
 name

 
 


update docs:
"docs": [
  {
"name": "苹果4s",
"type": "手机",
"price": 2000,
"id": "4017e35a-6b19-45b6-b945-382340ca1eec",
"_version_": 1466799722505175000
  },
  {
"name": "苹果5",
"type": "手机",
"price": 5000,
"id": "4052d9f3-f6d9-458f-8bb0-477b17852f37",
"_version_": 1466799735745544200
  },
  {
"name": "三星",
"type": "手机",
"price": 3000,
"id": "468abce8-8bb9-4f51-9900-8d4d6abc02ac",
"_version_": 1466799747596550100
  },
  {
"name": "摩托罗拉i3",
"type": "电脑",
"price": 1000,
"id": "db66bb02-3d6a-4ab0-9133-2e6e38b3d4dd",
"_version_": 1466799757491961900
  },
  {
"name": "摩托罗拉i5",
"type": "电脑",
"price": 1500,
"id": "f211525f-bc3c-4ea7-aded-1c46a94ecd1c",
"_version_": 1466799766311534600
  }
]
thank you , Erick,
i want to sort groups based on the sum of documents' scores within each
group, as you said, solr excels at getting the score of single documents, in
solr 4.6, the default sort of group each other depends on the maxScore of
all documents within each group, but the sum of documents' scores, though i
can get the sum of documents' scores by the client program, it's not good
idea, l know that the stats component of solr can statistics the long field,
so I had the idea to use statistic data for score field, but the score is
pse-udo field, the stats.field doesn't support it. In addition, as
scheme.xml displayed,  i do group on the elements of a string field(type)
without using participle.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Re-sort-groups-by-the-sum-of-the-scores-of-the-documents-within-each-group-tp4134715p4134830.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: dynamic field assignments

2014-05-05 Thread Chris Hostetter

: My understanding is that DynamicField can do something like
: FOO_BAR_TEXT_* but what I really need is *_TEXT_* as I might have
: FOO_BAR_TEXT_1 but I also might have WIDGET_BAR_TEXT_2.  Both of those
: field names need to map to a field type of 'fullText'.

I'm pretty sure you can get what you are after with the new Manged Schema 
functionality...

https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode
https://cwiki.apache.org/confluence/display/solr/Managed+Schema+Definition+in+SolrConfig

Assuming you have managed schema enabled in solrconfig.xml, and you define 
both of your fieldTypes using names like "text" and "select" then 
something like this should work in your processor chain... 

 
   .*_TEXT_.*
   text
 
 
   .*_SELECT_.*
   select
 


(Normally that processor is used once with multiple value->type mappings 
-- but in your case you don't care about the run-time value, just the run 
time field name regex (which should also be configurable according 
to the various FieldNameSelector rules...

https://lucene.apache.org/solr/4_8_0/solr-core/org/apache/solr/update/processor/AddSchemaFieldsUpdateProcessorFactory.html
https://lucene.apache.org/solr/4_8_0/solr-core/org/apache/solr/update/processor/FieldMutatingUpdateProcessorFactory.html


-Hoss
http://www.lucidworks.com/


Linking Two Fields Together

2014-05-05 Thread Steve Edwards
I'm using Sorl to create an image search functionality that allows users to 
search for an existing image in the site to add to new content.  A given piece 
of content has a field that can store multiple images, so I will need to use a 
multi-value Solr field to store image data. Currently, I'm storing the path and 
file name in a tom_* field, since I want to be able to search on file name. 
However, another piece of data that I need to store and retrieve is the file id 
used to identify the file in the database (in the same table as the image 
path). What is the best way to store this data so that the file id and path 
values are properly synced, since there can be multiple images for each piece 
of content?  I could just store the file path/name (I need that data to be 
searchable, so it has to be stored in Solr), and then query the db for the fid 
once I get the results back, but I'd rather not do that if I don't have to.

Searching around, it doesn't appear that I can store multiple pieces of data in 
one field without doing some sort of concatenation and then splitting at query 
time.  If I just use two separate fields in each document, is it safe to assume 
that the values will be synchronized in the search results? In other words, if 
I put two values each into tom_image_path and im_image_file_id, when I query 
and the document is returned, can I assume the values in the two fields are 
synchronized?

Or, is there a way to store multiple pieces of data in one field so that they 
can be indexed together and then retrived together?

Thanks.

Steve

Re: Strict Search in Apache Solr

2014-05-05 Thread Jack Krupansky
The term "strict search" is not in the Lucene/Solr nomenclature - it could 
mean any number of things.


It sounds as if maybe you want to do a phrase search, looking for an exact 
phrase - yes, you can do that by enclosing the phrase in quotes.


-- Jack Krupansky

-Original Message- 
From: Reyes, Mark

Sent: Monday, May 5, 2014 5:23 PM
To: solr-user@lucene.apache.org
Subject: Strict Search in Apache Solr

How could Solr accomplish an end-user behavior like a strict search?

Let’s say an end-user decides to use quotation marks in their keywords to 
provide specificity in their search results.


Current:
If you were to query: your future, then 10 results would return and print to 
the page.


Expected:
I’d like to query: “your future”, then less than 10 results would return and 
print to the page.


Regards,
Mark

IMPORTANT NOTICE: This e-mail message is intended to be received only by 
persons entitled to receive the confidential information it may contain. 
E-mail messages sent from Bridgepoint Education may contain information that 
is confidential and may be legally privileged. Please do not read, copy, 
forward or store this message unless you are an intended recipient of it. If 
you received this transmission in error, please notify the sender by reply 
e-mail and delete the message and any attachments. 



Re: Strict Search in Apache Solr

2014-05-05 Thread Reyes, Mark
Okay, let¹s try it this wayŠ

CURRENTLY:
Step 1: Type, your future into the search bar.
Step 2: 10 search results return.

I¹D LIKE TO SEE THIS:
Step 1: Type, ³your future² into the search bar.
Step 2: 1 search result returns.

Can this be accomplished through the Solr UI?

Thanks,

Mark

On 5/5/14, 3:17 PM, "Ahmet Arslan"  wrote:

>Hi Reyes,
>
>I think it is not clear your question.
>Please see : https://wiki.apache.org/solr/UsingMailingLists
>
>Ahmet
>
>On Tuesday, May 6, 2014 12:23 AM, "Reyes, Mark" 
>wrote:
>How could Solr accomplish an end-user behavior like a strict search?
>
>Let¹s say an end-user decides to use quotation marks in their keywords to
>provide specificity in their search results.
>
>Current:
>If you were to query: your future, then 10 results would return and print
>to the page.
>
>Expected:
>I¹d like to query: ³your future², then less than 10 results would return
>and print to the page.
>
>Regards,
>Mark
>
>IMPORTANT NOTICE: This e-mail message is intended to be received only by
>persons entitled to receive the confidential information it may contain.
>E-mail messages sent from Bridgepoint Education may contain information
>that is confidential and may be legally privileged. Please do not read,
>copy, forward or store this message unless you are an intended recipient
>of it. If you received this transmission in error, please notify the
>sender by reply e-mail and delete the message and any attachments. 


IMPORTANT NOTICE: This e-mail message is intended to be received only by 
persons entitled to receive the confidential information it may contain. E-mail 
messages sent from Bridgepoint Education may contain information that is 
confidential and may be legally privileged. Please do not read, copy, forward 
or store this message unless you are an intended recipient of it. If you 
received this transmission in error, please notify the sender by reply e-mail 
and delete the message and any attachments.

Histogram facet?

2014-05-05 Thread Romain
Hi,

I am trying to plot a non date field by time in order to draw an histogram
showing its evolution during the week.

For example, if I have a tweet index:

Tweet:
  date
  retweetCount

3 tweets indexed:
Tweet | Date | Retweet
A01/01   100
B01/01   100
C01/02   100

If I want to plot the number of tweets by day: easy with a date range facet:
Day 1: 2
Day 2: 1

But now counting the number of retweet by day is not possible natively:
Day 1: 200
Day 2: 100

On current workaround would be to do a date rage facet to get the date
slots and ask only for the retweet field and compute the sums in the
client. We could compute other stats like average, etc... too

The closest I could see was
https://issues.apache.org/jira/browse/SOLR-4772but it seems to be
slightly different.

Basically I am trying to do something very similar to the Date Histogram
Facetin
ES.

Is there a way to move the counting logic to the Solr server?

Thanks!

Romain


Re: Strict Search in Apache Solr

2014-05-05 Thread Ahmet Arslan
Hi Reyes,

I think it is not clear your question. 
Please see : https://wiki.apache.org/solr/UsingMailingLists

Ahmet

On Tuesday, May 6, 2014 12:23 AM, "Reyes, Mark"  wrote:
How could Solr accomplish an end-user behavior like a strict search?

Let’s say an end-user decides to use quotation marks in their keywords to 
provide specificity in their search results.

Current:
If you were to query: your future, then 10 results would return and print to 
the page.

Expected:
I’d like to query: “your future”, then less than 10 results would return and 
print to the page.

Regards,
Mark

IMPORTANT NOTICE: This e-mail message is intended to be received only by 
persons entitled to receive the confidential information it may contain. E-mail 
messages sent from Bridgepoint Education may contain information that is 
confidential and may be legally privileged. Please do not read, copy, forward 
or store this message unless you are an intended recipient of it. If you 
received this transmission in error, please notify the sender by reply e-mail 
and delete the message and any attachments.


Re: Relevancy help

2014-05-05 Thread Ahmet Arslan
Hi Ravi,

Regarding recency please see : 
http://www.slideshare.net/lucenerevolution/potter-timothy-boosting-documents-in-solr

Regarding "docs containing all words" there is function query that elevates 
those docs to top. Search existing mailing list past posts.

Ahmet


On Tuesday, May 6, 2014 12:42 AM, Ravi Solr  wrote:

Hello,
        I have a weird relevancy requirement. We search news content hence
chronology is very important and also relevancy, although both are mutually
exclusive. For example, if the search terms are -  malaysia airline crash
blackbox - my requirements are as follows

docs containing all words should be on top, but the editorial also wants
them sorted reverse by chronological order without loosing relevancy. Why
?? If on day 1 there is an article about search for blackbox but on day 2
the blackbox is found and day 3 there is an article about blackbox being
unusable...from the user's standpoint it makes sense that we show most
recent content on top.

I already boost recency of docs with
boost=recip(ms(NOW/HOUR,displaydatetime),7.889e-10,1,1) i.e. increments of
3 months

However when I do the boost the chronology is messed up. I know relevancy
and sorting are mutually exclusive concepts. Is there any magic that we can
do in SOLR which can achieve both ???


Thanks,

Ravi Kiran bhaskar


Re: Relevancy help

2014-05-05 Thread Jack Krupansky
The recip function query is the proper way to boost by reverse chronological 
order, but you may have to play around with the boost factor so that date 
does not completely overwhelm the natural relevancy.


Use the debugQuery=true parameter and look at the "explain" section to see 
what the document scores look like.


-- Jack Krupansky

-Original Message- 
From: Ravi Solr

Sent: Monday, May 5, 2014 5:41 PM
To: solr-user@lucene.apache.org
Subject: Relevancy help

Hello,
   I have a weird relevancy requirement. We search news content hence
chronology is very important and also relevancy, although both are mutually
exclusive. For example, if the search terms are -  malaysia airline crash
blackbox - my requirements are as follows

docs containing all words should be on top, but the editorial also wants
them sorted reverse by chronological order without loosing relevancy. Why
?? If on day 1 there is an article about search for blackbox but on day 2
the blackbox is found and day 3 there is an article about blackbox being
unusable...from the user's standpoint it makes sense that we show most
recent content on top.

I already boost recency of docs with
boost=recip(ms(NOW/HOUR,displaydatetime),7.889e-10,1,1) i.e. increments of
3 months

However when I do the boost the chronology is messed up. I know relevancy
and sorting are mutually exclusive concepts. Is there any magic that we can
do in SOLR which can achieve both ???


Thanks,

Ravi Kiran bhaskar 



Re: Turning on KeywordRepeat and RemoveDups on an existing fieldType.

2014-05-05 Thread Jack Krupansky
I haven't personally used this technique, but I gather that the intent is 
that the unstemmed term will have a lower term frequency (more unique) than 
the stemmed term which may generate the same stemmed term from a number of 
different source terms.


To answer your question, no, you don't need a separate field or type for 
this feature, but it will tend to generate a lot more terms in your index 
since it will index a stemmed term as two terms.


Only use the repeat/remove filters for the index analyzer.

You will need to reindex to see the full effect immediately, but you can do 
the reindex incrementally (as you replace existing documents) as well if you 
don't mind if the difference in relevancy takes an extended time to become 
apparent.


-- Jack Krupansky

-Original Message- 
From: Michael Tracey

Sent: Monday, May 5, 2014 4:52 PM
To: solr-user@lucene.apache.org
Subject: Turning on KeywordRepeat and RemoveDups on an existing fieldType.

As per the stemming docs ( 
https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Stemming ), I 
want to score the original term higher than the stemmed version by adding:


  
  

to a field type that is already created (with Stemming). I have 100M 
documents in this index, and it gets slowly reindexed every month as records 
change.  My question is, can I add this to the existing fieldType, or do I 
need to make a new fieldType, and copyField the data over to it, and after 
it's all reindexed switch my code?  I'd rather be able to just add the lines 
to my fieldType because I don't think I have enough disk space on my cloud 
members to hold my primary fulltext field twice.


Just in case it helps, I'm running 4.4.0 and the field I'm wanting to mod 
looks like this:


   positionIncrementGap="100">

 
   
   generateWordParts="1" generateNumberParts="1" catenateWords="1" 
catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>

   
   words="keyword_stopwords.txt" enablePositionIncrements="true" />
   protected="protwords.txt"/>

 
 
   
   ignoreCase="true" expand="true"/>
   words="keyword_stopwords.txt" enablePositionIncrements="true" />
   generateWordParts="1" generateNumberParts="1" catenateWords="0" 
catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>

   
   protected="protwords.txt"/>

 
   

Thanks,

M. 



Relevancy help

2014-05-05 Thread Ravi Solr
Hello,
I have a weird relevancy requirement. We search news content hence
chronology is very important and also relevancy, although both are mutually
exclusive. For example, if the search terms are -  malaysia airline crash
blackbox - my requirements are as follows

docs containing all words should be on top, but the editorial also wants
them sorted reverse by chronological order without loosing relevancy. Why
?? If on day 1 there is an article about search for blackbox but on day 2
the blackbox is found and day 3 there is an article about blackbox being
unusable...from the user's standpoint it makes sense that we show most
recent content on top.

I already boost recency of docs with
boost=recip(ms(NOW/HOUR,displaydatetime),7.889e-10,1,1) i.e. increments of
3 months

However when I do the boost the chronology is messed up. I know relevancy
and sorting are mutually exclusive concepts. Is there any magic that we can
do in SOLR which can achieve both ???


Thanks,

Ravi Kiran bhaskar


Strict Search in Apache Solr

2014-05-05 Thread Reyes, Mark
How could Solr accomplish an end-user behavior like a strict search?

Let’s say an end-user decides to use quotation marks in their keywords to 
provide specificity in their search results.

Current:
If you were to query: your future, then 10 results would return and print to 
the page.

Expected:
I’d like to query: “your future”, then less than 10 results would return and 
print to the page.

Regards,
Mark

IMPORTANT NOTICE: This e-mail message is intended to be received only by 
persons entitled to receive the confidential information it may contain. E-mail 
messages sent from Bridgepoint Education may contain information that is 
confidential and may be legally privileged. Please do not read, copy, forward 
or store this message unless you are an intended recipient of it. If you 
received this transmission in error, please notify the sender by reply e-mail 
and delete the message and any attachments.

Re: Error initializing QueryElevationComponent

2014-05-05 Thread Chris Hostetter

The full details are farther down in the stack...

: null:org.apache.solr.common.SolrException: SolrCore 'master' is not
: available due to init failure: Error initializing QueryElevationComponent.
...
: Caused by: org.apache.solr.common.SolrException: Error initializing
: QueryElevationComponent.
...
: Caused by: org.apache.solr.common.SolrException:
: org.xml.sax.SAXParseException; systemId: solrres:/elevate.xml; lineNumber:
: 28; columnNumber: 80; The reference to entity "ver" must end with the ';'
: delimiter.

The problem is that your elevate.xml is not a valid XML file at all -- you 
have a bare "&" character in there (as part of your "id" which is not 
valid in XML -- you are confusing hte parser into thinking that you intend 
for "&ver" to be an XML entity but you are missing the ";" at the end (and 
even if you had that, then you'd get an error that the entity "&ver;" is 
not defined) ...

: id="sitecore://master/{137f5eb3-eb84-4165-bef0-5be1fbbc3201}?lang=en&ver=1"/>


you need to use valid XML, so that id attribute should be something 
like...

id="sitecore://master/{137f5eb3-eb84-4165-bef0-5be1fbbc3201}?lang=en&ver=1"


-Hoss
http://www.lucidworks.com/


Turning on KeywordRepeat and RemoveDups on an existing fieldType.

2014-05-05 Thread Michael Tracey
As per the stemming docs ( 
https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Stemming ), I want 
to score the original term higher than the stemmed version by adding:

   
   

to a field type that is already created (with Stemming). I have 100M documents 
in this index, and it gets slowly reindexed every month as records change.  My 
question is, can I add this to the existing fieldType, or do I need to make a 
new fieldType, and copyField the data over to it, and after it's all reindexed 
switch my code?  I'd rather be able to just add the lines to my fieldType 
because I don't think I have enough disk space on my cloud members to hold my 
primary fulltext field twice.

Just in case it helps, I'm running 4.4.0 and the field I'm wanting to mod looks 
like this:


  





  
  






  


Thanks,

M.


Re: Core failure when a lot of processes are indexing

2014-05-05 Thread Erick Erickson
Take a look through the article I linked, 5 minutes may be an issue
since the transaction log will hold all 5 minutes worth of input. In
batch processes this can be quite a bit of data. Worse, when a Solr
instance terminates unexpectedly, the entire transaction log can be
replayed.

Consider setting your autommit max time to something much shorter, say
30 seconds. Or even less. NOTE openSearcher should be false.

Then set your soft commit time to the latency you can stand, i.e. if
the users don't need to be able to search for a long time you can set
this to hours.

FWIW,
Erick

On Mon, May 5, 2014 at 11:03 AM, Hakim Benoudjit  wrote:
> I've tried it & it worked by letting solr do the commit instead of my solr
> client.
> In solrconfig.xml:
> autocommit max_time has been set to 5 minutes & autosoftcommit max_time to
> something bigger.
>
> Thanks a lot guys!
>
>
> 2014-05-05 16:30 GMT+01:00 Erick Erickson :
>
>> You should not be committing from the client by and large, use the
>>  and  options in solrconfig.xml.
>>
>> See:
>> http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>>
>> Best,
>> Erick
>>
>> On Mon, May 5, 2014 at 8:12 AM, Hakim Benoudjit 
>> wrote:
>> > Is there an option in Solr (solrconfig.xml or somewhere else) to
>> regularize
>> > commits to the index.
>> > I meant to do a 'sleep' between each commit to the index, when data
>> > to-be-indexed is waiting inside a stack.
>> >
>> >
>> > 2014-05-05 15:58 GMT+01:00 Hakim Benoudjit :
>> >
>> >> The index is made with the same version of solr, that is searching
>> >> (4.6.0), the config file (solrconfig.xml) & schema.xml is the same too.
>> >> The only way for me to solve this issue is to let only one process to
>> >> index at the same time. Wouldnt a layer of message queue resolve this
>> issue?
>> >>
>> >>
>> >> 2014-05-04 18:33 GMT+01:00 Shawn Heisey :
>> >>
>> >> On 5/4/2014 9:30 AM, Hakim Benoudjit wrote:
>> >>> > Ok. These files contain what you've requested:
>> >>> >
>> >>> > First (the xml error): http://pastebin.com/ZcagK3T7
>> >>> > Second (java params): http://pastebin.com/JtWQpp6s
>> >>> > Third (Solr version): http://pastebin.com/wYdpdsAW
>> >>>
>> >>> Are you running with an index originally built by an earlier version of
>> >>> Solr?  If you are, you may be running into a known bug.  The last
>> >>> "caused by" section of the java stacktrace looks similar to the one in
>> >>> this issue -- which is indeed index corruption:
>> >>>
>> >>> https://issues.apache.org/jira/browse/LUCENE-5377
>> >>>
>> >>> If that's the problem you're experiencing, upgrading your Solr version
>> >>> will hopefully fix it.  Simply dropping in the 4.6.1 war file and any
>> >>> contrib jars should cause zero problems for your 4.6.0 install.
>> >>> Upgrading to 4.7.2 or 4.8.0 should be done with more care.
>> >>>
>> >>> Thanks,
>> >>> Shawn
>> >>>
>> >>>
>> >>
>> >>
>> >> --
>> >> Hakim Benoudjit.
>> >>
>> >
>> >
>> >
>> > --
>> > Hakim Benoudjit.
>>
>
>
>
> --
> Hakim Benoudjit.


Re: can't make GET request to solr in android app

2014-05-05 Thread blach
I have included the reference for this library in good way but still giving
me the same error.

feeling 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/can-t-make-GET-request-to-solr-in-android-app-tp4134584p4134785.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Odd XSLT behavior

2014-05-05 Thread Christopher Gross
Checked that first -- it's a test site with a small sample size.  The field
is set in all of the items.  And refreshing the query a few times can yield
either result (with/without the error).

I'm reverting back to an old version of my stack (my code, plus tomcat &
solr), I'll step through my previous work slowly to see if I can pinpoint
what breaks it.  If I can (ever) determine what caused it then I'll post it.

Thanks!

-- Chris


On Mon, May 5, 2014 at 2:05 PM, Chris Hostetter wrote:

>
> Shot in the dark: perhaps you have a doc w/o a value in the description
> field, which means the xsl:variable's select doesn't match anything; which
> perhaps means that your XSLT engine then leaves the variable undefined.
>
>
> : Solr 4.7.2 (and 4.6.1)
> : Tomcat 7.0.52
> : Java 1.7.0_45 (and _55)
> :
> : I'm getting some really odd behavior with some XSLT documents.  I've been
> : doing some upgrades to Java & Solr and I'm trying to narrow down where
> the
> : problems are happening.
> :
> : I have a few XSLT docs that I put into the conf/xslt directory for my
> : indexes  I haven't changed the in a while, and they were working fine
> for a
> : 3.X Solr, and seemed to work fine on an earlier 4.X release.
> :
> : The problem is that sometimes I get an error saying that a field can't be
> : found.   Here's a slice of the XSLT:
> :   
> : 
> : 
> : 
> : 
> :
> : http://www.w3.org/2005/Atom";>
> :   
> :   
> :  select="str[@name='url']"
> : />
> :   
> :   
> : 
> :   
> : 
> :   
> :   
> : 
> :   
> : 
> :
> :.
> : 
> :
> :I get messages saying that it can't find the "description" variable.
> : This was working perfectly well, but I can't seem to narrow down a
> specific
> : change that caused this.
> :
> : Caused by: javax.xml.transform.TransformerConfigurationException:
> : solrres:/xslt/osatom.xsl: line 115: Variable or parameter 'description'
> is
> : undefined.
> : at
> :
> com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl.newTemplates(TransformerFactoryImpl.java:964)
> : at
> :
> org.apache.solr.util.xslt.TransformerProvider.getTemplates(TransformerProvider.java:110)
> :
> : Has anyone run into a problem like this?  Thanks!
> :
> : -- Chris
> :
>
> -Hoss
> http://www.lucidworks.com/
>


Re: Help to Understand a Solr Query

2014-05-05 Thread nativecoder
That answer helps a lot

Where would the OR clause be ?

(Exact_Field1:samplestring1 OR Exact_Field1:samplestring2) AND
(Exact_Field2:samplestring1 OR Exact_Field2:samplestring2) AND
(Field1:samplestring1 OR Field1:samplestring2) AND (Field2:samplestring1
OR Field2:samplestring2)

Please note that in my query it is an AND clause. I am trying to understand
where the AND fits in. To be more precise my query is as below

q=samplestring1 AND samplestring2&defType: edismax&qf: Exact_Field1^1.0
Exact_Field2^0.9 Field1^0.8 Field2^0.7&fl= Column1, Column2 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Help-to-Understand-a-Solr-Query-tp4134686p4134775.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: interpretation of cat_rank in http://people.apache.org/~hossman/ac2012eu/

2014-05-05 Thread Chris Hostetter
: Hi everybody
:   can anyone give me a suitable interpretation for cat_rank in
: http://people.apache.org/~hossman/ac2012eu/ slide 15

Have you seen the video?  

http://vimeopro.com/user11514798/apache-lucene-eurocon-2012/video/55822630

That slide starts ~ 23:00 and i go through a description of this example.

TL;DW: cat_rank in this example would be a numeric ranking of the category 
the product is in - so cat_rank==N means the product is in the Nth most 
popular categoy on the site (so lower is better, but hte number is always 
a positive integer)




-Hoss
http://www.lucidworks.com/


Re: can't make GET request to solr in android app

2014-05-05 Thread Shawn Heisey
On 5/5/2014 12:17 PM, blach wrote:
> Thank you Shawn 
>
> I did what you told me. now this is my code:



> it gives me error that org.apache.solr.client.solrj is not found 

I don't know how to do classpath management in the Android enviroment. 
You'll need to add the solrj jar to your application classpath.  In the
download that I have extracted on my computer, this is named
dist/solr-solrj-4.7.2.jar ... the version number is usually in the
filename.  A number of other jars are also required.  You can find these
in the dist/solrj-lib directory.  If you need a newer or slightly older
version of one of the dependent jars for your own code, it is usually OK
to use a slightly different version.

Thanks,
Shawn



Re: can't make GET request to solr in android app

2014-05-05 Thread blach
Thank you Shawn 

I did what you told me. now this is my code:


import org.apache.solr.client.solrj.SolrQuery;
import org.apache.solr.client.solrj.SolrServer;
//import org.apache.solr.client.solrj.SolrServer;
import org.apache.solr.client.solrj.SolrServerException;

import org.apache.solr.client.solrj.impl.*;

import org.apache.solr.client.solrj.response.QueryResponse;
import org.apache.solr.common.SolrDocumentList;


import java.io.InputStream;
@Override
public void onClick(View v) {
// TODO Auto-generated method stub
//etxt2.setText(etxt1.getText());
  
  //ALERT MESSAGE
 // Toast.makeText(getBaseContext(),"Please wait, connecting to
server.",Toast.LENGTH_LONG).show(); 
SolrServer solr;
String urlString = 
"http://localhost:8983/solr/collection1";;
solr = new HttpSolrServer(urlString);

SolrQuery query = new SolrQuery();
query.set("qt", "/select");
query.set("q", "mem");
  
   QueryResponse response = null;

try {
response = 
solr.query(query);
SolrDocumentList 
results = response.getResults();
for (int i = 0; i < 
results.size(); ++i) {
  
//System.out.println(results.get(i));

etxt2.setText((CharSequence) results.get(i));
}
} catch (SolrServerException e) 
{
// TODO Auto-generated 
catch block
e.printStackTrace();
}
  }});   }




it gives me error that org.apache.solr.client.solrj is not found 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/can-t-make-GET-request-to-solr-in-android-app-tp4134584p4134769.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Stored vs non-stored very large text fields

2014-05-05 Thread Jochen Barth
I'll found out that "storing" Documents as separate docs+id does not  
help either.

You must have an completely separate collection/core to get things work fast.

Kind regards,
Jochen


Zitat von Jochen Barth :


Ok, https://wiki.apache.org/solr/SolrPerformanceFactors

states that: "Retrieving the stored fields of a query result can be  
a significant expense. This cost is affected largely by the number  
of bytes stored per document--the higher byte count, the sparser the  
documents will be distributed on disk and more I/O is necessary to  
retrieve the fields (usually this is a concern when storing large  
fields, like the entire contents of a document)."


But in my case (with docValues=true) there should be no reason to  
access *.fdt.


Kind regards,
Jochen

Zitat von Jochen Barth :


Something is really strange here:

even when configuring fields id + sort_... to docValues="true" --  
so there's nothing to get from "stored documents file" --  
performance is still terrible with ocr stored=true _even_ with my  
patch which stores uncompressed like solr4.0.0 (checked with  
strings -a on *.fdt).


Just reading  
http://lucene.472066.n3.nabble.com/Can-Solr-handle-large-text-files-td3439504.html .. perhaps things will clear up soon (will check if spltting to index+non-stored and non-indexed+stored could help  
here)



Kind regards,
J. Barth


Zitat von Shawn Heisey :


On 4/29/2014 4:20 AM, Jochen Barth wrote:

BTW: stored field compression:
are all "stored fields" within a document are put into one  
compressed chunk,

or by per-field basis?


Here's the issue that added the compression to Lucene:

https://issues.apache.org/jira/browse/LUCENE-4226

It was made the default stored field format for Lucene, which also made
it the default for Solr.  At this time, there is no way to remove
compression on Solr without writing custom code.  I filed an issue to
make it configurable, but I don't know how to do it.  Nobody else has
offered a solution either.  One day I might find some time to take a
look at the issue and see if I can solve it myself.

https://issues.apache.org/jira/browse/SOLR-4375

Here's the author's blog post that goes into more detail than the LUCENE
issue:

http://blog.jpountz.net/post/33247161884/efficient-compressed-stored-fields-with-lucene

Thanks,
Shawn





Re: Odd XSLT behavior

2014-05-05 Thread Chris Hostetter

Shot in the dark: perhaps you have a doc w/o a value in the description 
field, which means the xsl:variable's select doesn't match anything; which 
perhaps means that your XSLT engine then leaves the variable undefined.


: Solr 4.7.2 (and 4.6.1)
: Tomcat 7.0.52
: Java 1.7.0_45 (and _55)
: 
: I'm getting some really odd behavior with some XSLT documents.  I've been
: doing some upgrades to Java & Solr and I'm trying to narrow down where the
: problems are happening.
: 
: I have a few XSLT docs that I put into the conf/xslt directory for my
: indexes  I haven't changed the in a while, and they were working fine for a
: 3.X Solr, and seemed to work fine on an earlier 4.X release.
: 
: The problem is that sometimes I get an error saying that a field can't be
: found.   Here's a slice of the XSLT:
:   
: 
: 
: 
: 
: 
: http://www.w3.org/2005/Atom";>
:   
:   
: 
:   
:   
: 
:   
: 
:   
:   
: 
:   
: 
:
:.
: 
: 
:I get messages saying that it can't find the "description" variable.
: This was working perfectly well, but I can't seem to narrow down a specific
: change that caused this.
: 
: Caused by: javax.xml.transform.TransformerConfigurationException:
: solrres:/xslt/osatom.xsl: line 115: Variable or parameter 'description' is
: undefined.
: at
: 
com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl.newTemplates(TransformerFactoryImpl.java:964)
: at
: 
org.apache.solr.util.xslt.TransformerProvider.getTemplates(TransformerProvider.java:110)
: 
: Has anyone run into a problem like this?  Thanks!
: 
: -- Chris
: 

-Hoss
http://www.lucidworks.com/


Re: Help to Understand a Solr Query

2014-05-05 Thread nativecoder
That answer helps a lot

Where would the OR clause be ? 

(Exact_Field1:samplestring1 *OR* Exact_Field1:samplestring2) AND
(Exact_Field2:samplestring1 *OR* Exact_Field2:samplestring2) AND
(Field1:samplestring1 *OR* Field1:samplestring2) AND (Field2:samplestring1
*OR* Field2:samplestring2)

Please note that in my query it is an AND clause. I am trying to understand
where the AND fits in.

*query=samplestring1 AND samplestring2*
defType: edismax
queryFields: Exact_Field1^1.0 Exact_Field2^0.9 Field1^0.8 Field2^0.7
fieldList: Column1, Column2 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Help-to-Understand-a-Solr-Query-tp4134686p4134763.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Core failure when a lot of processes are indexing

2014-05-05 Thread Hakim Benoudjit
I've tried it & it worked by letting solr do the commit instead of my solr
client.
In solrconfig.xml:
autocommit max_time has been set to 5 minutes & autosoftcommit max_time to
something bigger.

Thanks a lot guys!


2014-05-05 16:30 GMT+01:00 Erick Erickson :

> You should not be committing from the client by and large, use the
>  and  options in solrconfig.xml.
>
> See:
> http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>
> Best,
> Erick
>
> On Mon, May 5, 2014 at 8:12 AM, Hakim Benoudjit 
> wrote:
> > Is there an option in Solr (solrconfig.xml or somewhere else) to
> regularize
> > commits to the index.
> > I meant to do a 'sleep' between each commit to the index, when data
> > to-be-indexed is waiting inside a stack.
> >
> >
> > 2014-05-05 15:58 GMT+01:00 Hakim Benoudjit :
> >
> >> The index is made with the same version of solr, that is searching
> >> (4.6.0), the config file (solrconfig.xml) & schema.xml is the same too.
> >> The only way for me to solve this issue is to let only one process to
> >> index at the same time. Wouldnt a layer of message queue resolve this
> issue?
> >>
> >>
> >> 2014-05-04 18:33 GMT+01:00 Shawn Heisey :
> >>
> >> On 5/4/2014 9:30 AM, Hakim Benoudjit wrote:
> >>> > Ok. These files contain what you've requested:
> >>> >
> >>> > First (the xml error): http://pastebin.com/ZcagK3T7
> >>> > Second (java params): http://pastebin.com/JtWQpp6s
> >>> > Third (Solr version): http://pastebin.com/wYdpdsAW
> >>>
> >>> Are you running with an index originally built by an earlier version of
> >>> Solr?  If you are, you may be running into a known bug.  The last
> >>> "caused by" section of the java stacktrace looks similar to the one in
> >>> this issue -- which is indeed index corruption:
> >>>
> >>> https://issues.apache.org/jira/browse/LUCENE-5377
> >>>
> >>> If that's the problem you're experiencing, upgrading your Solr version
> >>> will hopefully fix it.  Simply dropping in the 4.6.1 war file and any
> >>> contrib jars should cause zero problems for your 4.6.0 install.
> >>> Upgrading to 4.7.2 or 4.8.0 should be done with more care.
> >>>
> >>> Thanks,
> >>> Shawn
> >>>
> >>>
> >>
> >>
> >> --
> >> Hakim Benoudjit.
> >>
> >
> >
> >
> > --
> > Hakim Benoudjit.
>



-- 
Hakim Benoudjit.


Re: What are the best practices on Multiple Language support in Solr Cloud ?

2014-05-05 Thread shamik
Thanks Nicole. Leveraging dynamic field definitions is a great idea. Probably
work for me as I've a bunch of fields which are indexed as "String". Just
curious about the sharding, are you using Solr Cloud. I thought of taking
the dedicated shard / core route , but then, as using a composite key (for
dedup), managing dedicated core can cause issues at times.

As far as single field representation, thanks for validating my concern.
Probably its best to use when you've to address a multi-lingual search.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/What-are-the-best-practices-on-Multiple-Language-support-in-Solr-Cloud-tp4134006p4134743.html
Sent from the Solr - User mailing list archive at Nabble.com.


Odd XSLT behavior

2014-05-05 Thread Christopher Gross
Solr 4.7.2 (and 4.6.1)
Tomcat 7.0.52
Java 1.7.0_45 (and _55)

I'm getting some really odd behavior with some XSLT documents.  I've been
doing some upgrades to Java & Solr and I'm trying to narrow down where the
problems are happening.

I have a few XSLT docs that I put into the conf/xslt directory for my
indexes  I haven't changed the in a while, and they were working fine for a
3.X Solr, and seemed to work fine on an earlier 4.X release.

The problem is that sometimes I get an error saying that a field can't be
found.   Here's a slice of the XSLT:
  





http://www.w3.org/2005/Atom";>
  
  

  
  

  

  
  

  

   
   .


   I get messages saying that it can't find the "description" variable.
This was working perfectly well, but I can't seem to narrow down a specific
change that caused this.

Caused by: javax.xml.transform.TransformerConfigurationException:
solrres:/xslt/osatom.xsl: line 115: Variable or parameter 'description' is
undefined.
at
com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl.newTemplates(TransformerFactoryImpl.java:964)
at
org.apache.solr.util.xslt.TransformerProvider.getTemplates(TransformerProvider.java:110)

Has anyone run into a problem like this?  Thanks!

-- Chris


Re: can't make GET request to solr in android app

2014-05-05 Thread Shawn Heisey
On 5/5/2014 11:05 AM, blach wrote:
> I wrote this code for it, but its the same problem, in this case all
> the app is stopping, this is the code String urlString =
> "http://localhost:8983/solr";; SolrServer solr = new
> HttpSolrServer(urlString);
>
> SolrQuery query = new SolrQuery(); query.set("q", "mem"); 
> QueryResponse response = null;  try { response = solr.query(query); }
> catch (SolrServerException e) { // TODO Auto-generated catch block
> e.printStackTrace(); }  SolrDocumentList results =
> response.getResults(); for (int i = 0; i < results.size(); ++i) { 
> etxt2.setText((CharSequence) results.get(i)); }

Do you get any output to stderr?  Have you looked in the solr logfile to
see if there's an error logged there?

Note that you should add the core name to the URL -- using a path of
just /solr is deprecated in the newest Solr versions.

http://localhost:8983/solr/corename

Thanks,
Shawn



Re: can't make GET request to solr in android app

2014-05-05 Thread blach
Yes Im reading about SOLRJ now

I wrote this code for it, but its the same problem, in this case all the app
is stopping, this is the code
 String urlString =
"http://localhost:8983/solr";;
SolrServer solr = new HttpSolrServer(urlString);


SolrQuery query = new SolrQuery();
query.set("q", "mem");
  
   QueryResponse response = null;

try {
response = 
solr.query(query);
} catch (SolrServerException e) 
{
// TODO Auto-generated 
catch block
e.printStackTrace();
}

SolrDocumentList results = 
response.getResults();
for (int i = 0; i < results.size(); ++i) {
  
etxt2.setText((CharSequence) 
results.get(i));
}




--
View this message in context: 
http://lucene.472066.n3.nabble.com/can-t-make-GET-request-to-solr-in-android-app-tp4134584p4134735.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Help to Understand a Solr Query

2014-05-05 Thread Jack Krupansky
"dismax" means Disjunction Maximum, which means Lucene takes the highest 
scoring clause (field), for each search term. This is effectively an OR of 
the clauses.



-- Jack Krupansky
-Original Message- 
From: nativecoder

Sent: Monday, May 5, 2014 11:21 AM
To: solr-user@lucene.apache.org
Subject: Re: Help to Understand a Solr Query

I already went through the link. I understand about the boosting factor for
the relevancy

query=samplestring1 AND samplestring2
defType: edismax
queryFields: Exact_Field1^1.0 Exact_Field2^0.9 Field1^0.8 Field2^0.7
fieldList: Column1, Column2

I need to understand whether the samplestring1 and samplestring 2 both will
be searched in each field mentioned in queryFields. What I meant was ;

e.g (Exact_Field1:samplestring1 AND Exact_Field1:samplestring2) AND
(Exact_Field2:samplestring1 AND Exact_Field2:samplestring2) AND
(Field1:samplestring1 AND Field1:samplestring2) AND (Field2:samplestring1
AND Field2:samplestring2)

Is the above correct ?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Help-to-Understand-a-Solr-Query-tp4134686p4134714.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: can't make GET request to solr in android app

2014-05-05 Thread Shawn Heisey
On 5/5/2014 9:02 AM, blach wrote:
> It's not an error if you see my code, there is a catch statement, which
> contains the "FAIL" message, it does always show it.

In your code, you are not printing the stack trace or throwing the
exception.  If you want to see it in your own code, you'll need to
include code to write out the stacktrace from the exception.  If you
don't want to do that, you can look on the server log to see what the
exception is.

Since you are basically writing Java code (I'm aware that Dalvik is not
*actually* Java, but I've never written code for android), can you use
SolrJ instead of HttpClient?

Thanks,
Shawn



Re: Core failure when a lot of processes are indexing

2014-05-05 Thread Erick Erickson
You should not be committing from the client by and large, use the
 and  options in solrconfig.xml.

See: 
http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Best,
Erick

On Mon, May 5, 2014 at 8:12 AM, Hakim Benoudjit  wrote:
> Is there an option in Solr (solrconfig.xml or somewhere else) to regularize
> commits to the index.
> I meant to do a 'sleep' between each commit to the index, when data
> to-be-indexed is waiting inside a stack.
>
>
> 2014-05-05 15:58 GMT+01:00 Hakim Benoudjit :
>
>> The index is made with the same version of solr, that is searching
>> (4.6.0), the config file (solrconfig.xml) & schema.xml is the same too.
>> The only way for me to solve this issue is to let only one process to
>> index at the same time. Wouldnt a layer of message queue resolve this issue?
>>
>>
>> 2014-05-04 18:33 GMT+01:00 Shawn Heisey :
>>
>> On 5/4/2014 9:30 AM, Hakim Benoudjit wrote:
>>> > Ok. These files contain what you've requested:
>>> >
>>> > First (the xml error): http://pastebin.com/ZcagK3T7
>>> > Second (java params): http://pastebin.com/JtWQpp6s
>>> > Third (Solr version): http://pastebin.com/wYdpdsAW
>>>
>>> Are you running with an index originally built by an earlier version of
>>> Solr?  If you are, you may be running into a known bug.  The last
>>> "caused by" section of the java stacktrace looks similar to the one in
>>> this issue -- which is indeed index corruption:
>>>
>>> https://issues.apache.org/jira/browse/LUCENE-5377
>>>
>>> If that's the problem you're experiencing, upgrading your Solr version
>>> will hopefully fix it.  Simply dropping in the 4.6.1 war file and any
>>> contrib jars should cause zero problems for your 4.6.0 install.
>>> Upgrading to 4.7.2 or 4.8.0 should be done with more care.
>>>
>>> Thanks,
>>> Shawn
>>>
>>>
>>
>>
>> --
>> Hakim Benoudjit.
>>
>
>
>
> --
> Hakim Benoudjit.


Re: Solr Not Searching while INDEXING the DATA

2014-05-05 Thread Shawn Heisey
On 5/5/2014 5:39 AM, Sohan Kalsariya wrote:
> I am not able to search for the data while indexing.
> Indexing is done via the dataimport handler.
> While searching for the documents (in between indexing is happening), it
> gives the broken pipe exception and wont search anything.
> What should be the proper solution for this problem?

A broken pipe exception means that your client gave up and timed out
before Solr could respond, so it closed the TCP connection.  When Solr
finally was able to respond, the connection was gone, so the servlet
container logged that exception.

The most common reason for underlying performance issues that causes
problems like this is that you don't have enough RAM.  It could be
something else, of course.  A number of possible options are covered on
this wiki page:

http://wiki.apache.org/solr/SolrPerformanceProblems

I see that you asked the same question on the IRC channel early this
morning (in my timezone), but you were gone before I was awake to see that.

Thanks,
Shawn



Re: sort groups by the sum of the scores of the documents within each group

2014-05-05 Thread Erick Erickson
I don't think so. Solr excels at getting the score of single
documents, not aggregation.

It's not at all clear to me, though, that the sum of documents' scores
is a reasonable thing to sort by. Consider grouping on a very common
term. You'd never do this, but group on the elements of a text field.
Then the group 'a' would sort to the top almost always (or maybe 'the'
or...).

This sounds like an XY problem, what use-case are you trying to solve?

Best,
Erick

On Sun, May 4, 2014 at 9:31 PM, frank shi  wrote:
> Currently, solr grouping (http://wiki.apache.org/solr/FieldCollapsing) sorts
> groups "by the score of the top document within each group". E.g.
> [...]
> "groups":[{
> "groupValue":"81cb63020d0339adb019a924b2a9e0c2",
> "doclist":{"numFound":9,"start":0,"maxScore":4.729042,"docs":[
> {
>   "id":"7481df771afe39fab368ce19dfeeb528",
>   [...],
>   "score":4.729042},
> {
>   "id":"c879e95b5f16343dad8b1248133727c2",
>   [...],
>   "score":4.6635237},
> {
>   "id":"485b9aec90fd3ef381f013c51ab6a4df",
>   [...],
>   "score":4.347174}]
> }},
> [...]
> Is there an out-of-the-box way to sort groups by the sum of the scores of
> the documents within each group? E.g.
> [...]
> "groups":[{
> "groupValue":"81cb63020d0339adb019a924b2a9e0c2",
> "doclist":{"numFound":9,"start":0,"scoreSum":13.739738,"docs":[
> {
>   "id":"7481df771afe39fab368ce19dfeeb528",
>   [...],
>   "score":4.729042},
> {
>   "id":"c879e95b5f16343dad8b1248133727c2",
>   [...],
>   "score":4.6635237},
> {
>   "id":"485b9aec90fd3ef381f013c51ab6a4df",
>   [...],
>   "score":4.347174}]
> }},
> [...]
> With the release of sorting by Function Query
> (https://issues.apache.org/jira/browse/SOLR-1297), it seems that there
> should be a way to use the sum() function
> (http://wiki.apache.org/solr/FunctionQuery). But it's not quite close enough
> since the "score" field is not part of the documents.
>
> I feel like I'm close but I'm missing some obvious piece. I'm using Solr
> 4.6.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/sort-groups-by-the-sum-of-the-scores-of-the-documents-within-each-group-tp4134607.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Help to Understand a Solr Query

2014-05-05 Thread nativecoder
I already went through the link. I understand about the boosting factor for
the relevancy

query=samplestring1 AND samplestring2
defType: edismax
queryFields: Exact_Field1^1.0 Exact_Field2^0.9 Field1^0.8 Field2^0.7
fieldList: Column1, Column2 

I need to understand whether the samplestring1 and samplestring 2 both will
be searched in each field mentioned in queryFields. What I meant was ;

e.g (Exact_Field1:samplestring1 AND Exact_Field1:samplestring2) AND
(Exact_Field2:samplestring1 AND Exact_Field2:samplestring2) AND
(Field1:samplestring1 AND Field1:samplestring2) AND (Field2:samplestring1
AND Field2:samplestring2)

Is the above correct ?
 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Help-to-Understand-a-Solr-Query-tp4134686p4134714.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr does not recognize language

2014-05-05 Thread Ahmet Arslan
Hi Victor,

I don't know mysolr, I assume you are using /update/json, lets add your chain 
to defaults section.

  

        
         application/json
         langid
       
  




On Monday, May 5, 2014 4:06 PM, Victor Pascual  
wrote:
Hi there,

I'm indexing my documents using mysolr. I mainly generate a lost of json
objects and the run: solr.update(documents_array,'json')



On Mon, May 5, 2014 at 1:08 PM, Ahmet Arslan  wrote:

> Hi Victor,
>
> How do you index your documents? Your last config looks correct. However
> for example if you use data import handler you need to add update.chain
> there too. Same as extraction request hadler if you are using sole-cell.
>
>  class="org.apache.solr.handler.dataimport.DataImportHandler">
>     
>       /home/username/data-config.xml
>       langid
>     
>   
>
> By the way The URL
> http://localhost:8080/solr/update?commit=true&update.chain=langid was
> just an example and meant to feed xml update messages by POST method. Not
> to use in a browser.
>
> Ahmet
>
> On Monday, May 5, 2014 11:04 AM, Victor Pascual <
> vic...@mobilemediacontent.com> wrote:
>
> Thank you very much for you help Ahmet.
>
> However the language detection is still not workin. :(
> My solrconfig.xml didn't contain that lst section inside the update
> requestHandler.
> That's the content I added:
>
>    >                  class="solr.XmlUpdateRequestHandler">
> >       
> >         langid
> >       
> >    
> >
>
>    
> >        class="org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory">
> >          
> >            text
> >            lang
> >          
> >        
> >        
> >       
> >     
>
> Now, your suggested query
> http://localhost:8080/solr/update?commit=true&update.chain=langid returns
>
> 
> >
> >0
> >14
> >
> >
> And there is still no lang field in my documents.
> Any idea what am I doing wrong?
>
>
>
>
> On Tue, Apr 29, 2014 at 5:33 PM, Ahmet Arslan  wrote:
>
> Hi,
> >
> >solr/update should be used, not /solr/select
> >
> >curl 'http://localhost:8983/solr/update?commit=true&update.chain=langid'
> >
> >By the way don't you have following definition in your solrconfig.xml?
> >
> > 
> >
> >       
> >         langid
> >       
> >  
> >
> >
> >
> >
> >On Tuesday, April 29, 2014 4:50 PM, Victor Pascual <
> vic...@mobilemediacontent.com> wrote:
> >Hi Ahmet,
> >
> >thanks for your reply. Adding &update.chain=langid to my query doesn't
> >work: IP:8080/solr/select/?q=*%3A*&update.chain=langid
> >Regarding defining the chain in an UpdateRequestHandler... sorry for the
> >lame question but shall I paste those three lines to solrconfig.xml, or
> >shall I add them somewhere else?
> >
> >There is not UpdateRequestHandler in my solrconfig.
> >
> >Thanks!
> >
> >
> >
> >On Tue, Apr 29, 2014 at 3:13 PM, Ahmet Arslan  wrote:
> >
> >> Hi,
> >>
> >> Did you attach your chain to a UpdateRequestHandler?
> >>
> >> You can do it by adding &update.chain=langid to the URL or defining it
> in
> >> a defaults section as follows
> >>
> >> 
> >>      langid
> >>    
> >>
> >>
> >>
> >> On Tuesday, April 29, 2014 3:18 PM, Victor Pascual <
> >> vic...@mobilemediacontent.com> wrote:
> >> Dear all,
> >>
> >> I'm a new user of Solr. I've managed to index a bunch of documents (in
> >> fact, they are tweets) and everything works quite smoothly.
> >>
> >> Nevertheless it looks like Solr doesn't detect the language of my
> documents
> >> nor remove stopwords accordingly so I can extract the most frequent
> terms.
> >>
> >> I've added this piece of XML to my solrconfig.xml as well as the Tika
> lib
> >> jars.
> >>
> >>     
> >>         >>
> >>
> class="org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory">
> >>           
> >>             text
> >>             lang
> >>           
> >>         
> >>         
> >>        
> >>      
> >>
> >> There is no error in the tomcat log file, so I have no clue of why this
> >> isn't working.
> >> Any hint on how to solve this problem will be much appreciated!
> >>
> >
> >
>



Re: Core failure when a lot of processes are indexing

2014-05-05 Thread Hakim Benoudjit
Is there an option in Solr (solrconfig.xml or somewhere else) to regularize
commits to the index.
I meant to do a 'sleep' between each commit to the index, when data
to-be-indexed is waiting inside a stack.


2014-05-05 15:58 GMT+01:00 Hakim Benoudjit :

> The index is made with the same version of solr, that is searching
> (4.6.0), the config file (solrconfig.xml) & schema.xml is the same too.
> The only way for me to solve this issue is to let only one process to
> index at the same time. Wouldnt a layer of message queue resolve this issue?
>
>
> 2014-05-04 18:33 GMT+01:00 Shawn Heisey :
>
> On 5/4/2014 9:30 AM, Hakim Benoudjit wrote:
>> > Ok. These files contain what you've requested:
>> >
>> > First (the xml error): http://pastebin.com/ZcagK3T7
>> > Second (java params): http://pastebin.com/JtWQpp6s
>> > Third (Solr version): http://pastebin.com/wYdpdsAW
>>
>> Are you running with an index originally built by an earlier version of
>> Solr?  If you are, you may be running into a known bug.  The last
>> "caused by" section of the java stacktrace looks similar to the one in
>> this issue -- which is indeed index corruption:
>>
>> https://issues.apache.org/jira/browse/LUCENE-5377
>>
>> If that's the problem you're experiencing, upgrading your Solr version
>> will hopefully fix it.  Simply dropping in the 4.6.1 war file and any
>> contrib jars should cause zero problems for your 4.6.0 install.
>> Upgrading to 4.7.2 or 4.8.0 should be done with more care.
>>
>> Thanks,
>> Shawn
>>
>>
>
>
> --
> Hakim Benoudjit.
>



-- 
Hakim Benoudjit.


Re: Wildcard malfunctioning

2014-05-05 Thread Shawn Heisey
On 5/5/2014 5:19 AM, Jack Krupansky wrote:
> But, you stay that you are using the stemmer to remove diacritical
> marks... you can/should use ASCIIFoldingFilterFactory or
> MappingCharFilterFactory.

I like ICUFoldingFilterFactory for this, but it does require additional
contrib jars (included in the Solr download).  It lowercases too.

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUFoldingFilterFactory

Thanks,
Shawn



Re: can't make GET request to solr in android app

2014-05-05 Thread blach
Hi, 
It's not an error if you see my code, there is a catch statement, which
contains the "FAIL" message, it does always show it.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/can-t-make-GET-request-to-solr-in-android-app-tp4134584p4134709.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Core failure when a lot of processes are indexing

2014-05-05 Thread Hakim Benoudjit
The index is made with the same version of solr, that is searching (4.6.0),
the config file (solrconfig.xml) & schema.xml is the same too.
The only way for me to solve this issue is to let only one process to index
at the same time. Wouldnt a layer of message queue resolve this issue?


2014-05-04 18:33 GMT+01:00 Shawn Heisey :

> On 5/4/2014 9:30 AM, Hakim Benoudjit wrote:
> > Ok. These files contain what you've requested:
> >
> > First (the xml error): http://pastebin.com/ZcagK3T7
> > Second (java params): http://pastebin.com/JtWQpp6s
> > Third (Solr version): http://pastebin.com/wYdpdsAW
>
> Are you running with an index originally built by an earlier version of
> Solr?  If you are, you may be running into a known bug.  The last
> "caused by" section of the java stacktrace looks similar to the one in
> this issue -- which is indeed index corruption:
>
> https://issues.apache.org/jira/browse/LUCENE-5377
>
> If that's the problem you're experiencing, upgrading your Solr version
> will hopefully fix it.  Simply dropping in the 4.6.1 war file and any
> contrib jars should cause zero problems for your 4.6.0 install.
> Upgrading to 4.7.2 or 4.8.0 should be done with more care.
>
> Thanks,
> Shawn
>
>


-- 
Hakim Benoudjit.


Re: can't make GET request to solr in android app

2014-05-05 Thread blach
thanks,

basically I'm running solr on my localhost(computer) and trying to access it
through the emulator in eclipse, NOT in the physical phone.

Can it be done?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/can-t-make-GET-request-to-solr-in-android-app-tp4134584p4134706.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Block Join Score & Highlighting

2014-05-05 Thread StrW_dev
I changed the hardcoded BlockJoinChildQParser setting to use the parent
scoring and that seems to work. So I think I got rid of the scoring issue
:).
I also voted for the issue!


Didn't find a solution for the highlighting issue at the moment, but I am
considering to omit highlighting for now as it also causes the index to grow
big quickly as the fields need to be stored to support highlighting. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Block-Join-Score-Highlighting-tp4134045p4134702.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Help to Understand a Solr Query

2014-05-05 Thread Jack Krupansky

Read up on the edismax query parser first:
http://wiki.apache.org/solr/ExtendedDisMax

The "^" operator is known as boosting or field boosting and is used to 
influence document scores for relevancy.


It has no analog in SQL.

-- Jack Krupansky

-Original Message- 
From: nativecoder

Sent: Monday, May 5, 2014 9:11 AM
To: solr-user@lucene.apache.org
Subject: Help to Understand a Solr Query

Hi All

I am completely new to solr and hoping to understand the basics. Can one of
you help me to understand what the following query does, in which order it
is getting executed

I understand that when this query is executed fields mentioned in fieldList
will be returned. What I don't understand is how the "samplestring1" and
"samplestring2" will get searched with the query fields specified

I think I will be able to understand how the search happens if this can be
illustrated in SQL ( Just to understand what happens behind the scene)

Following is the query. Please have a look at it and let me know how this
works internally.
query=samplestring1 AND samplestring2
defType: edismax
queryFields: Exact_Field1^1.0 Exact_Field2^0.9 Field1^0.8 Field2^0.7
fieldList: Column1, Column2
resultRows: 10
startRow: 0

P.S samplestring1 AND samplestring2  are some test strings in the query

Sample of Schema for fields


















--
View this message in context: 
http://lucene.472066.n3.nabble.com/Help-to-Understand-a-Solr-Query-tp4134686.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Help to Understand a Solr Query

2014-05-05 Thread nativecoder
Hi All

I am completely new to solr and hoping to understand the basics. Can one of
you help me to understand what the following query does, in which order it
is getting executed

I understand that when this query is executed fields mentioned in fieldList
will be returned. What I don't understand is how the "samplestring1" and
"samplestring2" will get searched with the query fields specified

I think I will be able to understand how the search happens if this can be
illustrated in SQL ( Just to understand what happens behind the scene)

Following is the query. Please have a look at it and let me know how this
works internally.
query=samplestring1 AND samplestring2
defType: edismax
queryFields: Exact_Field1^1.0 Exact_Field2^0.9 Field1^0.8 Field2^0.7
fieldList: Column1, Column2
resultRows: 10
startRow: 0

P.S samplestring1 AND samplestring2  are some test strings in the query

Sample of Schema for fields














 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Help-to-Understand-a-Solr-Query-tp4134686.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr does not recognize language

2014-05-05 Thread Victor Pascual
Hi there,

I'm indexing my documents using mysolr. I mainly generate a lost of json
objects and the run: solr.update(documents_array,'json')


On Mon, May 5, 2014 at 1:08 PM, Ahmet Arslan  wrote:

> Hi Victor,
>
> How do you index your documents? Your last config looks correct. However
> for example if you use data import handler you need to add update.chain
> there too. Same as extraction request hadler if you are using sole-cell.
>
>  class="org.apache.solr.handler.dataimport.DataImportHandler">
> 
>   /home/username/data-config.xml
>   langid
> 
>   
>
> By the way The URL
> http://localhost:8080/solr/update?commit=true&update.chain=langid was
> just an example and meant to feed xml update messages by POST method. Not
> to use in a browser.
>
> Ahmet
>
> On Monday, May 5, 2014 11:04 AM, Victor Pascual <
> vic...@mobilemediacontent.com> wrote:
>
> Thank you very much for you help Ahmet.
>
> However the language detection is still not workin. :(
> My solrconfig.xml didn't contain that lst section inside the update
> requestHandler.
> That's the content I added:
>
>>  class="solr.XmlUpdateRequestHandler">
> >   
> > langid
> >   
> >
> >
>
>
> >class="org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory">
> >  
> >text
> >lang
> >  
> >
> >
> >   
> > 
>
> Now, your suggested query
> http://localhost:8080/solr/update?commit=true&update.chain=langid returns
>
> 
> >
> >0
> >14
> >
> >
> And there is still no lang field in my documents.
> Any idea what am I doing wrong?
>
>
>
>
> On Tue, Apr 29, 2014 at 5:33 PM, Ahmet Arslan  wrote:
>
> Hi,
> >
> >solr/update should be used, not /solr/select
> >
> >curl 'http://localhost:8983/solr/update?commit=true&update.chain=langid'
> >
> >By the way don't you have following definition in your solrconfig.xml?
> >
> > 
> >
> >   
> > langid
> >   
> >  
> >
> >
> >
> >
> >On Tuesday, April 29, 2014 4:50 PM, Victor Pascual <
> vic...@mobilemediacontent.com> wrote:
> >Hi Ahmet,
> >
> >thanks for your reply. Adding &update.chain=langid to my query doesn't
> >work: IP:8080/solr/select/?q=*%3A*&update.chain=langid
> >Regarding defining the chain in an UpdateRequestHandler... sorry for the
> >lame question but shall I paste those three lines to solrconfig.xml, or
> >shall I add them somewhere else?
> >
> >There is not UpdateRequestHandler in my solrconfig.
> >
> >Thanks!
> >
> >
> >
> >On Tue, Apr 29, 2014 at 3:13 PM, Ahmet Arslan  wrote:
> >
> >> Hi,
> >>
> >> Did you attach your chain to a UpdateRequestHandler?
> >>
> >> You can do it by adding &update.chain=langid to the URL or defining it
> in
> >> a defaults section as follows
> >>
> >> 
> >>  langid
> >>
> >>
> >>
> >>
> >> On Tuesday, April 29, 2014 3:18 PM, Victor Pascual <
> >> vic...@mobilemediacontent.com> wrote:
> >> Dear all,
> >>
> >> I'm a new user of Solr. I've managed to index a bunch of documents (in
> >> fact, they are tweets) and everything works quite smoothly.
> >>
> >> Nevertheless it looks like Solr doesn't detect the language of my
> documents
> >> nor remove stopwords accordingly so I can extract the most frequent
> terms.
> >>
> >> I've added this piece of XML to my solrconfig.xml as well as the Tika
> lib
> >> jars.
> >>
> >> 
> >> >>
> >>
> class="org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory">
> >>   
> >> text
> >> lang
> >>   
> >> 
> >> 
> >>
> >>  
> >>
> >> There is no error in the tomcat log file, so I have no clue of why this
> >> isn't working.
> >> Any hint on how to solve this problem will be much appreciated!
> >>
> >
> >
>


Explain Solr Query Execution

2014-05-05 Thread nativecoder
How will a query like below will get executed, In which order

I understand that when this query is executed fields mentioned in fieldList
will be returned. What I don't understand is how the "samplestring1" and
"samplestring2" will get searched with the query fields specified

I think I will be able to understand how the search happens if this can be
illustrated in SQL ( Just to understand what happens behind the scene)

Following is the query. Please have a look at it and let me know how this
works internally.
query=samplestring1 AND samplestring2
defType: edismax
queryFields: Exact_Field1^1.0 Exact_Field2^0.9 Field1^0.8 Field2^0.7
fieldList: Column1, Column2
resultRows: 10
startRow: 0

P.S samplestring1 AND samplestring2  are some test strings in the query

Sample of Schema for fields


















--
View this message in context: 
http://lucene.472066.n3.nabble.com/Explain-Solr-Query-Execution-tp4134681.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr Not Searching while INDEXING the DATA

2014-05-05 Thread Sohan Kalsariya
I am not able to search for the data while indexing.
Indexing is done via the dataimport handler.
While searching for the documents (in between indexing is happening), it
gives the broken pipe exception and wont search anything.
What should be the proper solution for this problem?
Am I missing something?
Help me!

-- 
Regards,
*Sohan Kalsariya*


RE: Wildcard malfunctioning

2014-05-05 Thread Román González
SOLVED!

First solution I tried (the Ahmet's one) worked fine!

Thank you!

-Mensaje original-
De: Jack Krupansky [mailto:j...@basetechnology.com] 
Enviado el: lunes, 05 de mayo de 2014 13:19
Para: solr-user@lucene.apache.org; rgonza...@normagricola.com
Asunto: Re: Wildcard malfunctioning

Generally, stemming filters are not supported when wildcards are present. 
Only a small subset of filters work with wildcards, such as the case conversion 
filters.

But, you stay that you are using the stemmer to remove diacritical marks... 
you can/should use ASCIIFoldingFilterFactory or MappingCharFilterFactory.

-- Jack Krupansky

-Original Message-
From: Román González
Sent: Monday, May 5, 2014 7:00 AM
To: solr-user@lucene.apache.org
Subject: Wildcard malfunctioning

Hi all!



Sorry in advance if this question was posted but I were unable to find it with 
search engines.



Filter SpanishLightStemFilterFactory is not working properly with wildcards or 
I’m misunderstanding something. I have the field



   



With this type:





  











  





But I’m getting these results:



q = cultivo_es:uva

Getting 50 correct results



q = cultivo_es:uva*

Getting the same 50 correct results



q = cultivo_es:naranja

Getting the 50 correct results of “naranja”



q = cultivo_es:naranja*

Getting the 0 results !



It works fine if I remove SpanishLightStemFilterFactory filter, but I need it 
in order to filter diacritics according to Spanish rules.



Thank you!!





Re: Wildcard malfunctioning

2014-05-05 Thread Jack Krupansky
Generally, stemming filters are not supported when wildcards are present. 
Only a small subset of filters work with wildcards, such as the case 
conversion filters.


But, you stay that you are using the stemmer to remove diacritical marks... 
you can/should use ASCIIFoldingFilterFactory or MappingCharFilterFactory.


-- Jack Krupansky

-Original Message- 
From: Román González

Sent: Monday, May 5, 2014 7:00 AM
To: solr-user@lucene.apache.org
Subject: Wildcard malfunctioning

Hi all!



Sorry in advance if this question was posted but I were unable to find it
with search engines.



Filter SpanishLightStemFilterFactory is not working properly with wildcards
or I’m misunderstanding something. I have the field



  



With this type:



   

 

   

   

   

   

   

 

   



But I’m getting these results:



q = cultivo_es:uva

Getting 50 correct results



q = cultivo_es:uva*

Getting the same 50 correct results



q = cultivo_es:naranja

Getting the 50 correct results of “naranja”



q = cultivo_es:naranja*

Getting the 0 results !



It works fine if I remove SpanishLightStemFilterFactory filter, but I need
it in order to filter diacritics according to Spanish rules.



Thank you!!





Re: Wildcard malfunctioning

2014-05-05 Thread Ahmet Arslan


Hi Roman,

What you are experiencing is a OK and known. Stemming and wildcard searches 
could be counter intuitive sometimes. But luckily remedy is available. Use the 
following filters, and your wildcard searches will be happy. Please not that 
this change will require solr-restart and re-index.

 
 
 

Regarding diacritics, please see 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ASCIIFoldingFilterFactory
 
and http://wiki.apache.org/solr/MultitermQueryAnalysis

Ahmet


On Monday, May 5, 2014 2:01 PM, Román González  
wrote:
Hi all!



Sorry in advance if this question was posted but I were unable to find it
with search engines.



Filter SpanishLightStemFilterFactory is not working properly with wildcards
or I’m misunderstanding something. I have the field



   



With this type:



    

       

        

        

        

        

        

      

    



But I’m getting these results:



q = cultivo_es:uva

Getting 50 correct results



q = cultivo_es:uva*

Getting the same 50 correct results



q = cultivo_es:naranja

Getting the 50 correct results of “naranja”



q = cultivo_es:naranja*

Getting the 0 results !



It works fine if I remove SpanishLightStemFilterFactory filter, but I need
it in order to filter diacritics according to Spanish rules.



Thank you!!


Re: Solr does not recognize language

2014-05-05 Thread Ahmet Arslan
Hi Victor,

How do you index your documents? Your last config looks correct. However for 
example if you use data import handler you need to add update.chain there too. 
Same as extraction request hadler if you are using sole-cell.


    
      /home/username/data-config.xml
      langid
    
  

By the way The URL 
http://localhost:8080/solr/update?commit=true&update.chain=langid was just an 
example and meant to feed xml update messages by POST method. Not to use in a 
browser.

Ahmet

On Monday, May 5, 2014 11:04 AM, Victor Pascual  
wrote:

Thank you very much for you help Ahmet.

However the language detection is still not workin. :(
My solrconfig.xml didn't contain that lst section inside the update 
requestHandler.
That's the content I added:

                    class="solr.XmlUpdateRequestHandler">
>       
>         langid
>       
>    
>

   
>       class="org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory">
>          
>            text
>            lang
>          
>        
>        
>       
>     

Now, your suggested query 
http://localhost:8080/solr/update?commit=true&update.chain=langid returns


>
>0
>14
>
>
And there is still no lang field in my documents.
Any idea what am I doing wrong?




On Tue, Apr 29, 2014 at 5:33 PM, Ahmet Arslan  wrote:

Hi,
>
>solr/update should be used, not /solr/select
>
>curl 'http://localhost:8983/solr/update?commit=true&update.chain=langid' 
>
>By the way don't you have following definition in your solrconfig.xml?
>
>   
>
>       
>         langid
>             
>  
>
>
>
>
>On Tuesday, April 29, 2014 4:50 PM, Victor Pascual 
> wrote:
>Hi Ahmet,
>
>thanks for your reply. Adding &update.chain=langid to my query doesn't
>work: IP:8080/solr/select/?q=*%3A*&update.chain=langid
>Regarding defining the chain in an UpdateRequestHandler... sorry for the
>lame question but shall I paste those three lines to solrconfig.xml, or
>shall I add them somewhere else?
>
>There is not UpdateRequestHandler in my solrconfig.
>
>Thanks!
>
>
>
>On Tue, Apr 29, 2014 at 3:13 PM, Ahmet Arslan  wrote:
>
>> Hi,
>>
>> Did you attach your chain to a UpdateRequestHandler?
>>
>> You can do it by adding &update.chain=langid to the URL or defining it in
>> a defaults section as follows
>>
>> 
>>      langid
>>    
>>
>>
>>
>> On Tuesday, April 29, 2014 3:18 PM, Victor Pascual <
>> vic...@mobilemediacontent.com> wrote:
>> Dear all,
>>
>> I'm a new user of Solr. I've managed to index a bunch of documents (in
>> fact, they are tweets) and everything works quite smoothly.
>>
>> Nevertheless it looks like Solr doesn't detect the language of my documents
>> nor remove stopwords accordingly so I can extract the most frequent terms.
>>
>> I've added this piece of XML to my solrconfig.xml as well as the Tika lib
>> jars.
>>
>>     
>>        >
>> class="org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory">
>>           
>>             text
>>             lang
>>           
>>         
>>         
>>        
>>      
>>
>> There is no error in the tomcat log file, so I have no clue of why this
>> isn't working.
>> Any hint on how to solve this problem will be much appreciated!
>>
>
>


interpretation of cat_rank in http://people.apache.org/~hossman/ac2012eu/

2014-05-05 Thread Matteo Grolla
Hi everybody
can anyone give me a suitable interpretation for cat_rank in
http://people.apache.org/~hossman/ac2012eu/ slide 15

thanks

Wildcard malfunctioning

2014-05-05 Thread Román González
Hi all!

 

Sorry in advance if this question was posted but I were unable to find it
with search engines.

 

Filter SpanishLightStemFilterFactory is not working properly with wildcards
or I’m misunderstanding something. I have the field

 

   

 

With this type:

 



   











  



 

But I’m getting these results:

 

q = cultivo_es:uva

Getting 50 correct results

 

q = cultivo_es:uva*

Getting the same 50 correct results

 

q = cultivo_es:naranja

Getting the 50 correct results of “naranja”

 

q = cultivo_es:naranja*

Getting the 0 results !

 

It works fine if I remove SpanishLightStemFilterFactory filter, but I need
it in order to filter diacritics according to Spanish rules.

 

Thank you!!

 



Re: Solr does not recognize language

2014-05-05 Thread Frankcis
because if your encoding format doesn't both utf-8, building index will lead
to messy code, of course, you will not get the expected result.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-does-not-recognize-language-tp4133711p4134647.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr does not recognize language

2014-05-05 Thread Victor Pascual
Why this should be a problem?
Both files start with 


On Mon, May 5, 2014 at 11:44 AM, Frankcis  wrote:

> i think you should check your scheme.xml and solrconfig.xml encoding
> format =
> utf-8。
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-does-not-recognize-language-tp4133711p4134643.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Solr does not recognize language

2014-05-05 Thread Frankcis
i think you should check your scheme.xml and solrconfig.xml encoding format =
utf-8。



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-does-not-recognize-language-tp4133711p4134643.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr does not recognize language

2014-05-05 Thread Victor Pascual
Thank you very much for you help Ahmet.

However the language detection is still not workin. :(
My solrconfig.xml didn't contain that lst section inside the update
requestHandler.
That's the content I added:

 class="solr.XmlUpdateRequestHandler">
>
>  langid
>
> 
>


>
> class="org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory">
>   
> text
> lang
>   
> 
> 
>
>  


Now, your suggested query
http://localhost:8080/solr/update?commit=true&update.chain=langid returns


> 
> 0
> 14
> 
> 

And there is still no lang field in my documents.
Any idea what am I doing wrong?



On Tue, Apr 29, 2014 at 5:33 PM, Ahmet Arslan  wrote:

> Hi,
>
> solr/update should be used, not /solr/select
>
> curl 'http://localhost:8983/solr/update?commit=true&update.chain=langid'
>
> By the way don't you have following definition in your solrconfig.xml?
>
>  
>
>  langid
>
>   
>
>
>
> On Tuesday, April 29, 2014 4:50 PM, Victor Pascual <
> vic...@mobilemediacontent.com> wrote:
> Hi Ahmet,
>
> thanks for your reply. Adding &update.chain=langid to my query doesn't
> work: IP:8080/solr/select/?q=*%3A*&update.chain=langid
> Regarding defining the chain in an UpdateRequestHandler... sorry for the
> lame question but shall I paste those three lines to solrconfig.xml, or
> shall I add them somewhere else?
>
> There is not UpdateRequestHandler in my solrconfig.
>
> Thanks!
>
>
>
> On Tue, Apr 29, 2014 at 3:13 PM, Ahmet Arslan  wrote:
>
> > Hi,
> >
> > Did you attach your chain to a UpdateRequestHandler?
> >
> > You can do it by adding &update.chain=langid to the URL or defining it in
> > a defaults section as follows
> >
> > 
> >  langid
> >
> >
> >
> >
> > On Tuesday, April 29, 2014 3:18 PM, Victor Pascual <
> > vic...@mobilemediacontent.com> wrote:
> > Dear all,
> >
> > I'm a new user of Solr. I've managed to index a bunch of documents (in
> > fact, they are tweets) and everything works quite smoothly.
> >
> > Nevertheless it looks like Solr doesn't detect the language of my
> documents
> > nor remove stopwords accordingly so I can extract the most frequent
> terms.
> >
> > I've added this piece of XML to my solrconfig.xml as well as the Tika lib
> > jars.
> >
> > 
> > >
> >
> class="org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory">
> >   
> > text
> > lang
> >   
> > 
> > 
> >
> >  
> >
> > There is no error in the tomcat log file, so I have no clue of why this
> > isn't working.
> > Any hint on how to solve this problem will be much appreciated!
> >
>
>


stats pse-udo field score

2014-05-05 Thread frank shi
hey,everyone, In our application we are using Solr 4.6.
I had the idea to use stats component for score pse-udo field. 
Is it exists workaround of using "…stats=true&stats.field=score..." ? 
thanks a lot!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/stats-pse-udo-field-score-tp4134635.html
Sent from the Solr - User mailing list archive at Nabble.com.