AW: Guideline on when a field absolutely needs to be stored?

2018-01-17 Thread Clemens Wyss DEV
THX!

-Ursprüngliche Nachricht-
Von: Walter Underwood [mailto:wun...@wunderwood.org] 
Gesendet: Donnerstag, 18. Januar 2018 08:27
An: solr-user@lucene.apache.org
Betreff: Re: Guideline on when a field absolutely needs to be stored?

There is a nice table for all of the field options.

https://lucene.apache.org/solr/guide/7_2/field-properties-by-use-case.html

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Jan 17, 2018, at 11:23 PM, Clemens Wyss DEV  wrote:
> 
> Kind of "basic question" ... Am I right, that the only real reason to store a 
> field (stored="true") is when I want to fetch the "originating value" from 
> documents returned? 
> 
> What about 
> geo-location-fields?
> Any other reason/(search-)function requiring a field being stored?
> 
> Thx
> Clemens



Re: Guideline on when a field absolutely needs to be stored?

2018-01-17 Thread Walter Underwood
There is a nice table for all of the field options.

https://lucene.apache.org/solr/guide/7_2/field-properties-by-use-case.html

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Jan 17, 2018, at 11:23 PM, Clemens Wyss DEV  wrote:
> 
> Kind of "basic question" ... Am I right, that the only real reason to store a 
> field (stored="true") is when I want to fetch the "originating value" from 
> documents returned? 
> 
> What about 
> geo-location-fields?
> Any other reason/(search-)function requiring a field being stored?
> 
> Thx
> Clemens



Guideline on when a field absolutely needs to be stored?

2018-01-17 Thread Clemens Wyss DEV
Kind of "basic question" ... Am I right, that the only real reason to store a 
field (stored="true") is when I want to fetch the "originating value" from 
documents returned? 

What about 
geo-location-fields?
Any other reason/(search-)function requiring a field being stored?

Thx
Clemens


Re: Partial results from streaming expressions (i.e. making them "stream")

2018-01-17 Thread Radu Gheorghe
Hi Joel, thanks for your follow-up!

Indeed, that's my experience as well - that the export handler streams
data fast enough. Though now that you mention batches, I'm curious if
that batch size is configurable or something like that.

The higher level issue is that I need to show results to the user as
quickly as possible. For example, imagine a rollup on a relatively
high cardinality field, but with lots of documents per user as well. I
want to show counters as soon as they come up, instead of when I have
all of them.

To have results coming in as quickly as possible, I need data to be
streamed quickly (latency-wise) between source and decorators, as well
as from the Solr node receiving the initial request to the client
(UI).

The first part seem to be already happening in my tests (though I've
heard complaints that it doesn't - I'll come back to it if I
misunderstood something), but I can't get partial results to the HTTP
client issuing the original requests.

Does this clarify my issue?

Thanks again and best regards,
Radu
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Wed, Jan 17, 2018 at 8:59 PM, Joel Bernstein  wrote:
> I'm not sure I understand the issue fully. From a streaming standpoint, you
> get real streamed data from the /export handler. When you use the export
> handler the bitset for the search results is materialized in memory, but
> all result are sorted/streamed in batches. This allows the exported handler
> to export result sets of any size.
>
> The underlying buffer sizes are really abstracted away and not meant to
> dealt with.
>
> What's the higher level issue you are concerned with?
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Wed, Jan 17, 2018 at 8:54 AM, Radu Gheorghe 
> wrote:
>
>> Hello,
>>
>> I have some updates on this, but it's still not very clear for me how
>> to move forward.
>>
>> The good news is, that between sources and decorators, data seems to
>> be really streamed. I hope I tested this the right way, by simply
>> adding a log message to ReducerStream saying "hey, I got this tuple".
>> Now I have two nodes, nodeA with data and nodeB with a dummy
>> collection. If I hit nodeB's /stream endpoint and ask it for, say, a
>> unique() to wrap the previously mentioned expression (with 1s sleeps),
>> I see a log from ReducerStream every second. Good.
>>
>> Now, the final result (to the client, via curl), only gets to me after
>> N seconds (where N is the number of results I get). I did some more
>> digging on this front, too. Let's assume we have chunked encoding
>> re-enabled (that's a must) and no other change (if I flush() the
>> FastWriter, say, after every tuple, then I get every tuple as it's
>> computed, but I'm trying to explore the buffers). I've noticed the
>> following:
>> - the first response comes after ~64K, then I get chunks of 32K each
>> - at this point, if I set response.setBufferSize() in
>> HttpSolrCall.writeResponse() to a small size (say, 128), I get the
>> first reply after 32K and then 8K chunks
>> - I thought that maybe in this context I could lower BUFSIZE in
>> FastWriter, but that didn't seem to make any change :(
>>
>> That said, I'm not sure it's worth looking into these buffers any
>> deeper, because shrinking them might negatively affect other results
>> (e.g. regular searches or facets). It sounds like the way forward
>> would be that manual flushing, with chunked encoding enabled. I could
>> imagine adding some parameters in the lines of "flush every N tuples
>> or M milliseconds", that would be computed per-request, or at least
>> globally to the /stream handler.
>>
>> What do you think? Would such a patch be welcome, to add these
>> parameters? But it still requires chunked encoding - would reverting
>> SOLR-8669 be a problem? Or maybe there's a more elegant way to enable
>> chunked encoding, maybe only for streams?
>>
>> Best regards,
>> Radu
>> --
>> Performance Monitoring * Log Analytics * Search Analytics
>> Solr & Elasticsearch Support * http://sematext.com/
>>
>>
>> On Mon, Jan 15, 2018 at 10:58 AM, Radu Gheorghe
>>  wrote:
>> > Hello fellow solr-users!
>> >
>> > Currently, if I do an HTTP request to receive some data via streaming
>> > expressions, like:
>> >
>> > curl --data-urlencode 'expr=search(test,
>> >q="foo_s:*",
>> >fl="foo_s",
>> >sort="foo_s asc",
>> >qt="/export")'
>> > http://localhost:8983/solr/test/stream
>> >
>> > I get all results at once. This is more obvious if I simply introduce
>> > a one-second sleep in CloudSolrStream: with three documents, the
>> > request takes about three seconds, and I get all three docs after
>> > three seconds.
>> >
>> > Instead, I would like to get documents in a more "streaming" way. 

Re: Adding a child doc incrementally

2018-01-17 Thread Gus Heck
If the document routing can be arranged such that the children and the
parent are always co-located in the same shard, and share an identifier,
the graph query can pull back the parent plus any arbitrary number of
"children" that have been added at any time in any order. In this scheme
"children" are just things that match your graph query... (
https://lucene.apache.org/solr/guide/6_6/other-parsers.html#OtherParsers-GraphQueryParser)
However, if your query has to cross shards, that won't work (yet...
https://issues.apache.org/jira/browse/SOLR-11384).

More info here:
https://www.slideshare.net/lucidworks/solr-graph-query-presented-by-kevin-watters-kmw-technology

On Mon, Jan 15, 2018 at 2:09 PM, S G  wrote:

> Hi,
>
> We have a use-case where a single document can contain thousands of child
> documents.
> However, I could not find any way to do it incrementally.
> Only way is to read the full document from Solr, add the new child document
> to it and then re-index the full document will all of its child documents
> again.
> This causes lot of reads from Solr just to form the document with one extra
> document.
> Ideally, I would have liked to only send the parent-ID and the
> child-document only as part of an "incremental update" command to Solr.
>
> Is there a way to incrementally add a child document to a parent document?
>
> Thanks
> SG
>



-- 
http://www.the111shift.com


Re: Doubts regarding usage of block join query

2018-01-17 Thread Aashish Agarwal
So is there any way to solve the above problem?

On Jan 18, 2018 1:31 AM, "Mikhail Khludnev"  wrote:

> sure
>
> On Wed, Jan 17, 2018 at 9:39 PM, Aashish Agarwal 
> wrote:
>
> > Hello,
> > I tried to use block join query feature in solr 4.6.0. My data is in
> > database but since 4.6 does not support DIH with child=true, so I created
> > the csv in order list of child followed by parent.
> > I used the csv to import data as described in
> > https://gist.github.com/mkhludnev/6406734#file-t-shirts-xml
> > So, _root_ value is created by me not internally by solr. Would that
> > create a problem.
> >
> > Please help.
> >
> > Thanks,
> > Aashish
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


Re: Spatial search - indexing WKT data

2018-01-17 Thread Gus Heck
It's been a while since I did it, but I'm pretty sure that when I indexed
polygons a couple years ago, I just sent WKT text for the field value... I
think i do recall some niggle where there was some slight mismatch in wkt
accepted by the javascript library I wanted to use and solr. (One was
slightly more permissive about something). As for fields I was using RPT
fields, which is more appropriate for arbitrary polygons I believe... also
note you may need to add JTS to solr as described here:
https://lucene.apache.org/solr/guide/7_2/spatial-search.html#jts-and-polygons-flat

I do think the Solr docs really should have examples of indexing these more
interesting field types. Presently the docs are pretty query focused...

On Sun, Jan 14, 2018 at 5:53 PM, Leila Deljkovic <
leila.deljko...@koordinates.com> wrote:

> Hi,
>
> I have some data in WKT string format (either POLYGON or MULTIPOLYGON) and
> I’d like to index it in Solr 7.0. As there are multiple polygons in every
> WKT string, I’d ideally like to index them multiValued BBoxField (couldn’t
> find anywhere to confirm, but it looks like multiValued is a valid
> attribute for BBoxField). Anyone indexed WKT data in Solr before? Is it
> necessary to convert it to CSV (I would do that first but I’m having
> trouble exporting it as CSV…)?
>
> Thanks




-- 
http://www.the111shift.com


Re: Solr Exception: Undefined Field

2018-01-17 Thread Rick Leir
Deepak
Would you like to write your post again without asterisks? Include the 
asterisks which are necessary to the query of course.
Rick

On January 17, 2018 1:10:28 PM EST, Deepak Goel  wrote:
>*Hello*
>
>*In Solr Admin: I type the q parameter as - *
>
>*text_entry:**
>
>*It gives the following exception (In the schema I do see a field as
>text_entry):*
>
>{ "responseHeader":{ "zkConnected":true, "status":400, "QTime":2,
>"params":{
>"q":"text_entry:*", "_":"1516190134181"}}, "error":{ "metadata":[
>"error-class","org.apache.solr.common.SolrException",
>"root-error-class",
>"org.apache.solr.common.SolrException"], "msg":"undefined field
>text_entry",
>"code":400}}
>
>
>*However when i type the q paramter as -*
>
>*{!term f=text_entry}henry*
>
>*This does give out the output as foll:*
>
>{ "responseHeader":{ "zkConnected":true, "status":0, "QTime":0,
>"params":{ "
>q":"{!term f=text_entry}henry", "_":"1516190134181"}},
>"response":{"numFound
>":262,"start":0,"docs":[ { "type":"line", "line_id":"80075",
>"play_name":"Richard
>II", "speech_number":"13", "line_number":"3.3.37", "speaker":"HENRY
>BOLINGBROKE", "text_entry":"Henry Bolingbroke", "id":
>"9428c765-a4e8-4116-937a-9b70e8a8e2de",
>"_version_":1588569205789163522, "
>speaker_str":["HENRY BOLINGBROKE"], "text_entry_str":["Henry
>Bolingbroke"],
>"line_number_str":["3.3.37"], "type_str":["line"],
>"play_name_str":["Richard
>II"]}, {
>**
>
>Any ideas what is going wrong in the first q?
>
>Thank You
>
>Deepak
>"Please stop cruelty to Animals, help by becoming a Vegan"
>+91 73500 12833
>deic...@gmail.com
>
>Facebook: https://www.facebook.com/deicool
>LinkedIn: www.linkedin.com/in/deicool
>
>"Plant a Tree, Go Green"
>
>
>Virus-free.
>www.avg.com
>
><#m_-480358672325756571_m_-3347175065213108175_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com
-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

Re: trivia question: why q=*:* doesn't return same result as q.alt=*:*

2018-01-17 Thread Nawab Zada Asad Iqbal
Chris / Hoss

Thanks for the detailed explanation. Erick Erickson's explanation made
sense to me but it didn't explain the part why the fields are different for
'hello' vs '*:*' .

I had never paid much attention the parser part of query handling and so
far focused only on the field definitions. I had to re-read parts of this
thread to understand the whole picture.

I had dropped an apparently unnecessary question but this thread has
provided a lot of necessary learning.


Thanks
Nawab

On Fri, Jan 12, 2018 at 10:38 AM, Chris Hostetter 
wrote:

>
> : defType=dismax does NOT do anything special with *:* other than treat it
> ...
> : > As Chris explained, this is special:
> ...
>
> I'm interpreting your followup question differently then Erick & Erik
> did.  I'm going to assume both E & E missunderstood your question, and i'm
> going to assume you completley understood my response to your original
> question.
>
> I'm going to assume that a way to rewrod/expand your followup question is
> something like this...
>
> "I understand now that defType=dismax doesn't support special syntax like
> '*:*' and treats that 3 input as just another 3 character string to search
> against the qf & pf fields -- but now what i don't understand is why are
> list of fields in the debug query output is different for 'q=*:*' compared
> to something like 'q=hello'"
>
> (If i have not understood your followup question correctly, please
> clarify)
>
> Let's look at those outputs you mentioned...
>
> : >> http://localhost:8983/solr/filesearch/select?fq=id:1193;
> : >> q=*:*=true
> : >>
> : >>
> : >>   - parsedquery: "+DisjunctionMaxQuery((user_email:*:* |
> user_name:*:* |
> : >>   tags:*:* | (name_shingle_zh-cn:, , name_shingle_zh-cn:, ,) |
> : >> id:*:*)~0.01)
> : >>   DisjunctionMaxQuery(((name_shingle_zh-cn:", , , ,"~100)^100.0 |
> : >>   tags:*:*)~0.01)",
> ...
> : >> e.g. following query uses the my expected set of pf and qf.
> ...
> : >> http://localhost:8983/solr/filesearch/select?fq=id:1193;
> : >> q=hello=true
> : >>
> : >>
> : >>
> : >>   - parsedquery: "+DisjunctionMaxQuery(((name_token:hello)^60.0 |
> : >>   user_email:hello | (name_combined:hello)^10.0 |
> (name_zh-cn:hello)^10.0
> : >> |
> : >>   name_shingle:hello | comments:hello | user_name:hello |
> : >> description:hello |
> : >>   file_content_zh-cn:hello | file_content_de:hello | tags:hello |
> : >>   file_content_it:hell | file_content_fr:hello | file_content_es:hell
> |
> : >>   file_content_en:hello | id:hello)~0.01)
> : >> DisjunctionMaxQuery((description:hello
> : >>   | (name_shingle:hello)^100.0 | comments:hello | tags:hello)~0.01)",
>
>
> The answer has to do with the list of qf & pf fields you have confiugred
> -- you didn't provide us with concrete specifics of what qf/pf you
> have configured in your requestHandler -- but you did mention in your
> second example that "following query uses the my expected set of pf and
> qf"
>
> By comparing the 2 examples at a glance, It appears that the fields in the
> first example (q=*:* ... again, searching for the literal 3 character
> string '*:*') are (mostly) a subset of the fields you "expected" (from the
> 2nd example)
>
> I'm fairly certain that what's happening here is that in both examples the
> literal string input is being given to the analyzer for all of your fields
> -- but in the case of the (literal) string '*:*' many of the analyzers are
> producing no terms at all -- ie: they are completley striping out
> punctuation -- so they don't appear in the final query.
>
> IIUC it looks like one other oddity here is that the reverse also
> seems to be true in some cases -- i suspect that
> although "name_shingle_zh-cn" doesn't appera in your 2nd example, it
> probably *is* in your pf param but whatever analyzer you have confiured
> for it produces no tokens for the latin characters "hello" but does
> produces tokens for the pure-punctuation characters "*:*"
>
>
> (If i'm correct about your question, but wrong about your qf/pf then
> please provide us with a lot more details -- notably your full
> schema/solrconfig used when executing those queries.
>
>
> -Hoss
> http://www.lucidworks.com/
>


Re: Solr 7 spatial search and WKT

2018-01-17 Thread Leila Deljkovic
Hi Emir

I’ve been following one of the only examples I could find on how to index a 
POLYGON, which does specify the field as multiValued:

Configuration: schema.xml

 
Index a polygon (JavaScript syntax around WKT):
{"id":"1", "geo_rpt":
"POLYGON((30 10, 10 20, 20 40, 40 40, 30 10))”}

Indexing one MULTIPOLYGON works also, but trying to enter them as a list like 
you would for any other multiValued field does not work. I couldn’t find 
explicitly that RptWithGeometrySpatialField supports multiValued, but according 
to the Solr docs, it is derived from SpatialRecursivePrefixTreeFieldType (RPT) 
which supports multiValued and is “configured just like RPT except that the 
default distErrPct is 0.15 (higher than 0.025)…”

The reason I’m trying to index multiple shapes per document is that in the 
index, each “layer” (document) has grid cells associated with it (they help 
describe the density of features across the layer; more density = smaller grid 
cells in an area). Indexing the grid cells will allow me to figure out how 
relevant a result for a search extent on a map might be; a layer could cover an 
entire country but be dense in major cities, so if I am looking for a major 
city, I’d want to boost this search result. Hope that makes sense. I’m not sure 
if flattening into a single shape would work for this purpose.

Thanks :)

> On 17/01/2018, at 10:12 PM, Emir Arnautović  
> wrote:
> 
> Hi Leila,
> I haven’t been using spatial in a while and did not test this, but based on 
> error, it seems that multivalue is not supported for this field type. Can you 
> index a single MULTIPOLYGON? Why do you need to have multiple values? Can you 
> flat your geometry to a single MULTIPOLYGON or MULTIGEOMETRY (if supported)? 
> Can you explain why do you need to have multiValued field?
> 
> Thanks,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> 
> 
> 
>> On 17 Jan 2018, at 00:09, Leila Deljkovic  
>> wrote:
>> 
>> Hi all,
>> 
>> I need to index multiple POLYGONS/MULTIPOLYGONS per document; I’m trying to 
>> use multiValued RptWithGeometrySpatialField and I’m getting this error:
>> 
>> Exception writing document id leila_test to the index; possible analysis 
>> error: DocValuesField "gridcell_rpt" appears more than once in this document 
>> (only one value is allowed per field)
>> 
>> This is what I’m indexing:
>> {
>>  "id": "leila_test",
>>  "gridcell_rpt": ["POLYGON((30 10, 10 20, 20 40, 40 40, 30 10))”, 
>> "MULTIPOLYGON(((30 20, 45 40, 10 40, 30 20)), ((15 5, 40 10, 10 20, 5 10, 15 
>> 5)))"]
>> }
>> 
>> This is what’s in my schema.xml:
>>  
>>  …
>>  >   
>>  distanceUnits=”kilometers” autoIndex="true”/>
>> 
>> I’m pretty confused on why this isn’t working. I can’t find an example of 
>> multiValued RptWithGeometrySpatialField anywhere -_-
>> 
>> Thanks :)
> 



Re: Doubts regarding usage of block join query

2018-01-17 Thread Mikhail Khludnev
sure

On Wed, Jan 17, 2018 at 9:39 PM, Aashish Agarwal 
wrote:

> Hello,
> I tried to use block join query feature in solr 4.6.0. My data is in
> database but since 4.6 does not support DIH with child=true, so I created
> the csv in order list of child followed by parent.
> I used the csv to import data as described in
> https://gist.github.com/mkhludnev/6406734#file-t-shirts-xml
> So, _root_ value is created by me not internally by solr. Would that
> create a problem.
>
> Please help.
>
> Thanks,
> Aashish
>



-- 
Sincerely yours
Mikhail Khludnev


Re: Querying on sum of child documents

2018-01-17 Thread Mikhail Khludnev
Hello,
It should be something like
{!parent ... score=total}+description:support +exp:[3 TO 7] {!func}exp

On Wed, Jan 17, 2018 at 3:05 PM, Prathyusha Kondeti  wrote:

> Hi,
>  I have the following indexed documents
> {
>"id":"data1",
>"isParent":"1",
>"_childDocuments":[
>   {
>  "description":
> "Benefit and Payroll consultant with HR team support ",
>  "isParent":"2",
>  "exp":2
>   },
>   {
>  "description":" ERP Avalon Implementation and Support Payroll",
>  "isParent":"2",
>  "exp":5
>   },
>   {
>  "description":" lucene solr",
>  "isParent":"2",
>  "exp":2
>   }
>]
>
> > }
>
>
>
> How can I form a query as
> *select?q=**:**=(description:Payroll AND sum(exp):[7  TO  10]) AND
> **(description:support
> AND sum(exp):**[3 TO  10]**) *
>
> and get the parent *document id data1* as response.
> I need to sum the experience of child documents only if the description
> contains Payroll keyword
> Please guide me how can I achive the above query and response with my
> indexed documents
> --
>
> Thanks & Regards,
>
> Prathyusha Kondeti  | Software Engineer
>
> Software Development
>
>
> [image: website-logo-org.png]
>
> CEIPAL Solutions Pvt Ltd
>
> Prashanthi Towers, 4th Floor, Road No: 92, Jubilee Hills, Hyderabad -
> 500033, INDIA
>
> [O] +91-40-43515100  [M] +91 9848143513  [E]  prathyush...@ceipal.com  [W]
> www.ceipal.com
>
> 
>
> 
>
> [image: consider.png]
> This email and any files transmitted with it are confidential and intended
> solely for the use of the individual or entity to whom they are addressed.
> If you have received this email in error please notify the system manager.
> This message contains confidential information and is intended only for the
> individual named. If you are not the named addressee you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately by e-mail if you have received this e-mail by mistake and
> delete this e-mail from your system. If you are not the intended recipient
> you are notified that disclosing, copying, distributing or taking any
> action in reliance on the contents of this information is strictly
> prohibited.
>



-- 
Sincerely yours
Mikhail Khludnev


Re: Partial results from streaming expressions (i.e. making them "stream")

2018-01-17 Thread Joel Bernstein
I'm not sure I understand the issue fully. From a streaming standpoint, you
get real streamed data from the /export handler. When you use the export
handler the bitset for the search results is materialized in memory, but
all result are sorted/streamed in batches. This allows the exported handler
to export result sets of any size.

The underlying buffer sizes are really abstracted away and not meant to
dealt with.

What's the higher level issue you are concerned with?


Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Jan 17, 2018 at 8:54 AM, Radu Gheorghe 
wrote:

> Hello,
>
> I have some updates on this, but it's still not very clear for me how
> to move forward.
>
> The good news is, that between sources and decorators, data seems to
> be really streamed. I hope I tested this the right way, by simply
> adding a log message to ReducerStream saying "hey, I got this tuple".
> Now I have two nodes, nodeA with data and nodeB with a dummy
> collection. If I hit nodeB's /stream endpoint and ask it for, say, a
> unique() to wrap the previously mentioned expression (with 1s sleeps),
> I see a log from ReducerStream every second. Good.
>
> Now, the final result (to the client, via curl), only gets to me after
> N seconds (where N is the number of results I get). I did some more
> digging on this front, too. Let's assume we have chunked encoding
> re-enabled (that's a must) and no other change (if I flush() the
> FastWriter, say, after every tuple, then I get every tuple as it's
> computed, but I'm trying to explore the buffers). I've noticed the
> following:
> - the first response comes after ~64K, then I get chunks of 32K each
> - at this point, if I set response.setBufferSize() in
> HttpSolrCall.writeResponse() to a small size (say, 128), I get the
> first reply after 32K and then 8K chunks
> - I thought that maybe in this context I could lower BUFSIZE in
> FastWriter, but that didn't seem to make any change :(
>
> That said, I'm not sure it's worth looking into these buffers any
> deeper, because shrinking them might negatively affect other results
> (e.g. regular searches or facets). It sounds like the way forward
> would be that manual flushing, with chunked encoding enabled. I could
> imagine adding some parameters in the lines of "flush every N tuples
> or M milliseconds", that would be computed per-request, or at least
> globally to the /stream handler.
>
> What do you think? Would such a patch be welcome, to add these
> parameters? But it still requires chunked encoding - would reverting
> SOLR-8669 be a problem? Or maybe there's a more elegant way to enable
> chunked encoding, maybe only for streams?
>
> Best regards,
> Radu
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
>
>
> On Mon, Jan 15, 2018 at 10:58 AM, Radu Gheorghe
>  wrote:
> > Hello fellow solr-users!
> >
> > Currently, if I do an HTTP request to receive some data via streaming
> > expressions, like:
> >
> > curl --data-urlencode 'expr=search(test,
> >q="foo_s:*",
> >fl="foo_s",
> >sort="foo_s asc",
> >qt="/export")'
> > http://localhost:8983/solr/test/stream
> >
> > I get all results at once. This is more obvious if I simply introduce
> > a one-second sleep in CloudSolrStream: with three documents, the
> > request takes about three seconds, and I get all three docs after
> > three seconds.
> >
> > Instead, I would like to get documents in a more "streaming" way. For
> > example, after X seconds give me what you already have. Or if an
> > Y-sized buffer fills up, give me all the tuples you have, then resume.
> >
> > Any ideas/opinions in terms of how I could achieve this? With or
> > without changing Solr's code?
> >
> > Here's what I have so far:
> > - this is normal with non-chunked HTTP/1.1. You get all results at
> > once. If I revert this patch[1] and get Solr to use chunked encoding,
> > I get partial results every... what seems to be a certain size between
> > 16KB and 32KB
> > - I couldn't find a way to manually change this... what I assume is a
> > buffer size, but failed so far. I've tried changing Jetty's
> > response.setBufferSize() in HttpSolrCall (maybe the wrong place to do
> > it?) and also tried changing the default 8KB buffer in FastWriter
> > - manually flushing the writer (in JSONResponseWriter) gives the
> > expected results (in combination with chunking)
> >
> > The thing is, even if I manage to change the buffer size, I assume
> > that will apply to all requests (not just streaming expressions). I
> > assume that ideally it would be configurable per request. As for
> > manual flushing, that would require changes to the streaming
> > expressions themselves. Would that be the way to go? What do you
> > think?
> >
> > [1] 

Re: Need help with solr highlighting feature

2018-01-17 Thread Aashish Agarwal
Hello Steve,

Sorry to disturb, the issue was due to custom tokenizer that I used. Since
that was not storing offset so term vector was not working.
Its resolved now.

On Jan 17, 2018 11:06 PM, "Steve Rowe"  wrote:

> Hi Aashish,
>
> Which version of Solr are you using?
>
> Please share your configuration: highlighter and schema.
>
> --
> Steve
> www.lucidworks.com
>
> > On Jan 16, 2018, at 12:20 PM, Aashish Agarwal 
> wrote:
> >
> > Hello,
> >
> > I am using solr highlighting feature on multivalued field containing
> korean
> > words.The feature is not working as expected. Search is working fine but
> in
> > case of highlighting it gives response as .
> >
> > I am storing term vector for the field and it is also stored=true.
> >
> > Please reply soon. Need this feature working urgently.
> >
> > Thanks,
> > Aashish
>
>


Solr Exception: Undefined Field

2018-01-17 Thread Deepak Goel
*Hello*

*In Solr Admin: I type the q parameter as - *

*text_entry:**

*It gives the following exception (In the schema I do see a field as
text_entry):*

{ "responseHeader":{ "zkConnected":true, "status":400, "QTime":2, "params":{
"q":"text_entry:*", "_":"1516190134181"}}, "error":{ "metadata":[
"error-class","org.apache.solr.common.SolrException", "root-error-class",
"org.apache.solr.common.SolrException"], "msg":"undefined field text_entry",
"code":400}}


*However when i type the q paramter as -*

*{!term f=text_entry}henry*

*This does give out the output as foll:*

{ "responseHeader":{ "zkConnected":true, "status":0, "QTime":0, "params":{ "
q":"{!term f=text_entry}henry", "_":"1516190134181"}}, "response":{"numFound
":262,"start":0,"docs":[ { "type":"line", "line_id":"80075",
"play_name":"Richard
II", "speech_number":"13", "line_number":"3.3.37", "speaker":"HENRY
BOLINGBROKE", "text_entry":"Henry Bolingbroke", "id":
"9428c765-a4e8-4116-937a-9b70e8a8e2de", "_version_":1588569205789163522, "
speaker_str":["HENRY BOLINGBROKE"], "text_entry_str":["Henry Bolingbroke"],
"line_number_str":["3.3.37"], "type_str":["line"], "play_name_str":["Richard
II"]}, {
**

Any ideas what is going wrong in the first q?

Thank You

Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"


Virus-free.
www.avg.com

<#m_-480358672325756571_m_-3347175065213108175_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>


Re: [EXTERNAL] Highlighter is not working for wildcard query

2018-01-17 Thread allen greg
UNSUBSCRIBE

On Wed, Jan 17, 2018 at 10:19 AM, David M Giannone 
wrote:

>
>
>
>
> Sent via the Samsung Galaxy S® 6, an AT 4G LTE smartphone
>
>
>  Original message 
> From: Selvam Raman 
> Date: 1/17/18 11:47 AM (GMT-05:00)
> To: solr-user@lucene.apache.org
> Subject: [EXTERNAL] Highlighter is not working for wildcard query
>
> Hi,
>
> solr version 6.4.2
>
> hl.method = unified, hl.bs.type=Word, this setting working fine for normal
> queries but failed in wildcard queries.(tried other hl.bs.type parmeter and
> without hl.bs.type as well, highlighting not working for wildcard queries)
>
> hl.method = original, this is working fine for both normal queries and
> wildcard queries.
>
> Why unified is not working and original/default is working fine for
> wildcard queries?
>
> any suggestion would be appreciated.
>
> --
> Selvam Raman
> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
>
>
> Nothing in this message is intended to constitute an electronic signature
> unless a specific statement to the contrary is included in this message.
>
> Confidentiality Note: This message is intended only for the person or
> entity to which it is addressed. It may contain confidential and/or
> privileged material. Any review, transmission, dissemination or other use,
> or taking of any action in reliance upon this message by persons or
> entities other than the intended recipient is prohibited and may be
> unlawful. If you received this message in error, please contact the sender
> and delete it from your computer.
>


Re: Need help with solr highlighting feature

2018-01-17 Thread Steve Rowe
Hi Aashish,

Which version of Solr are you using?

Please share your configuration: highlighter and schema.

--
Steve
www.lucidworks.com

> On Jan 16, 2018, at 12:20 PM, Aashish Agarwal  wrote:
> 
> Hello,
> 
> I am using solr highlighting feature on multivalued field containing korean
> words.The feature is not working as expected. Search is working fine but in
> case of highlighting it gives response as .
> 
> I am storing term vector for the field and it is also stored=true.
> 
> Please reply soon. Need this feature working urgently.
> 
> Thanks,
> Aashish



Re: [EXTERNAL] Highlighter is not working for wildcard query

2018-01-17 Thread David M Giannone




Sent via the Samsung Galaxy S® 6, an AT 4G LTE smartphone


 Original message 
From: Selvam Raman 
Date: 1/17/18 11:47 AM (GMT-05:00)
To: solr-user@lucene.apache.org
Subject: [EXTERNAL] Highlighter is not working for wildcard query

Hi,

solr version 6.4.2

hl.method = unified, hl.bs.type=Word, this setting working fine for normal
queries but failed in wildcard queries.(tried other hl.bs.type parmeter and
without hl.bs.type as well, highlighting not working for wildcard queries)

hl.method = original, this is working fine for both normal queries and
wildcard queries.

Why unified is not working and original/default is working fine for
wildcard queries?

any suggestion would be appreciated.

--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"


Nothing in this message is intended to constitute an electronic signature 
unless a specific statement to the contrary is included in this message.

Confidentiality Note: This message is intended only for the person or entity to 
which it is addressed. It may contain confidential and/or privileged material. 
Any review, transmission, dissemination or other use, or taking of any action 
in reliance upon this message by persons or entities other than the intended 
recipient is prohibited and may be unlawful. If you received this message in 
error, please contact the sender and delete it from your computer.


Re: Highlighter is not working for wildcard query

2018-01-17 Thread Selvam Raman
Query Parser
  defType=edismax

On Wed, Jan 17, 2018 at 4:47 PM, Selvam Raman  wrote:

> Hi,
>
> solr version 6.4.2
>
> hl.method = unified, hl.bs.type=Word, this setting working fine for normal
> queries but failed in wildcard queries.(tried other hl.bs.type parmeter and
> without hl.bs.type as well, highlighting not working for wildcard queries)
>
> hl.method = original, this is working fine for both normal queries and
> wildcard queries.
>
> Why unified is not working and original/default is working fine for
> wildcard queries?
>
> any suggestion would be appreciated.
>
> --
> Selvam Raman
> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
>



-- 
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"


Highlighter is not working for wildcard query

2018-01-17 Thread Selvam Raman
Hi,

solr version 6.4.2

hl.method = unified, hl.bs.type=Word, this setting working fine for normal
queries but failed in wildcard queries.(tried other hl.bs.type parmeter and
without hl.bs.type as well, highlighting not working for wildcard queries)

hl.method = original, this is working fine for both normal queries and
wildcard queries.

Why unified is not working and original/default is working fine for
wildcard queries?

any suggestion would be appreciated.

-- 
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"


Re: Combine Results with Two different Collections.

2018-01-17 Thread Gus Heck
If you just want docs from both collections in the same results, create an
alias across the 2 collections.

https://lucene.apache.org/solr/guide/6_6/collections-api.html

On Thu, Jan 11, 2018 at 11:12 PM, Suman Saurabh 
wrote:

> Try using solr streaming api.
> https://lucene.apache.org/solr/guide/6_6/streaming-expressions.html
> Sample query:
> innerJoin(
> select(search(, q=, fl= select fields>, sort=, qt="/export"),  with alias >),
> select(search(search(, q=,
> fl=, sort=, qt="/export"),
> ),
> on=)
>
> Note :  Both collections must have at least one common field to perform the
> joining.
>
> On Fri, Jan 12, 2018 at 4:57 AM, Fiz Newyorker 
> wrote:
>
> > Hi Team,
> >
> > Could you please let me know how to handle the below scenario.
> >
> > I have Two Collections  *Accounts & Content.*
> >
> > I am trying to search term *"biodata". *
> >
> > from Accounts Collections I am getting the output   *sample : your
> biodata*
> >
> > from Content Collections I am getting the output  *title : my biodata*
> >
> > *how to combine these Two results against 2 different collection and
> > get the results .  can you let me know sample solr query to run ? *
> >
> > Your help is much appreciated.
> >
> >
> > Thanks
> > Fiz.
> >
>
>
>
> --
> Thanks and Regards,
> Suman Saurabh
>



-- 
http://www.the111shift.com


Querying on sum of child documents

2018-01-17 Thread Prathyusha Kondeti
Hi,
 I have the following indexed documents
{
   "id":"data1",
   "isParent":"1",
   "_childDocuments":[
  {
 "description":
"Benefit and Payroll consultant with HR team support ",
 "isParent":"2",
 "exp":2
  },
  {
 "description":" ERP Avalon Implementation and Support Payroll",
 "isParent":"2",
 "exp":5
  },
  {
 "description":" lucene solr",
 "isParent":"2",
 "exp":2
  }
   ]

> }



How can I form a query as
*select?q=**:**=(description:Payroll AND sum(exp):[7  TO  10]) AND
**(description:support
AND sum(exp):**[3 TO  10]**) *

and get the parent *document id data1* as response.
I need to sum the experience of child documents only if the description
contains Payroll keyword
Please guide me how can I achive the above query and response with my
indexed documents
-- 

Thanks & Regards,

Prathyusha Kondeti  | Software Engineer

Software Development


[image: website-logo-org.png]

CEIPAL Solutions Pvt Ltd

Prashanthi Towers, 4th Floor, Road No: 92, Jubilee Hills, Hyderabad -
500033, INDIA

[O] +91-40-43515100  [M] +91 9848143513  [E]  prathyush...@ceipal.com  [W]
www.ceipal.com





[image: consider.png]
This email and any files transmitted with it are confidential and intended
solely for the use of the individual or entity to whom they are addressed.
If you have received this email in error please notify the system manager.
This message contains confidential information and is intended only for the
individual named. If you are not the named addressee you should not
disseminate, distribute or copy this e-mail. Please notify the sender
immediately by e-mail if you have received this e-mail by mistake and
delete this e-mail from your system. If you are not the intended recipient
you are notified that disclosing, copying, distributing or taking any
action in reliance on the contents of this information is strictly
prohibited.


Re: Multi-core logging - can't tell which core?

2018-01-17 Thread Shawn Heisey

On 1/17/2018 5:24 AM, Mark Sullivan wrote:

I am migrating a good number of cores over to the latest instance of solr 
(well, 7.1.0) installed locally.  It is working well, but my code is 
occasionally sending requests to search or index an old field that was replaced 
in the schema.

I see this in the logging, but I can't determine which core the log comes from. 
  How can I tell which core is receiving the offending requests?


Whatever you are referring to is not here.  The mailing list eats most 
attachments -- they don't make it through.


If you edit server/etc/jetty.xml you'll find a commented out section 
that enables request logging.  Remove the comment marks, restart Solr, 
and you'll get a log in server/logs that will include the source address 
for all requests.  You won't be able to tell which core sent the 
request, but narrowing it down to which server sent the request will 
probably be enough to investigate further.  You will need to make sure 
that you have good time synchronization on the servers so that 
timestamps in logs will match up.


Thanks,
Shawn


Re: modify number of Shard

2018-01-17 Thread Sushil K Tripathi
Thanks Erick!!!


With Warm Regards...
Sushil K. Tripathi



From: Erick Erickson 
Sent: Tuesday, January 16, 2018 11:02 AM
To: solr-user
Subject: Re: modify number of Shard

If you're using the default compositeId routing then this will not be
possible. You can increase to any multiple of 4 using SPLITSHARD, but
you can't just add an arbitrary number of shards.

If you're using the implicit router then you can add as many shards as you want.

Best,
Erick

On Tue, Jan 16, 2018 at 7:38 AM, Sushil K Tripathi
 wrote:
> Team,
>
>
> We have a existing setup of Solr Cluster and we need to modify the current 
> number of Shard from 4 to 6. Can somebody confirm if it is possible to modify 
> the number of Shard or i just need to recreate the whole collection.
>
>
> With Warm Regards...
> Sushil K. Tripathi


Re: Partial results from streaming expressions (i.e. making them "stream")

2018-01-17 Thread Radu Gheorghe
Hello,

I have some updates on this, but it's still not very clear for me how
to move forward.

The good news is, that between sources and decorators, data seems to
be really streamed. I hope I tested this the right way, by simply
adding a log message to ReducerStream saying "hey, I got this tuple".
Now I have two nodes, nodeA with data and nodeB with a dummy
collection. If I hit nodeB's /stream endpoint and ask it for, say, a
unique() to wrap the previously mentioned expression (with 1s sleeps),
I see a log from ReducerStream every second. Good.

Now, the final result (to the client, via curl), only gets to me after
N seconds (where N is the number of results I get). I did some more
digging on this front, too. Let's assume we have chunked encoding
re-enabled (that's a must) and no other change (if I flush() the
FastWriter, say, after every tuple, then I get every tuple as it's
computed, but I'm trying to explore the buffers). I've noticed the
following:
- the first response comes after ~64K, then I get chunks of 32K each
- at this point, if I set response.setBufferSize() in
HttpSolrCall.writeResponse() to a small size (say, 128), I get the
first reply after 32K and then 8K chunks
- I thought that maybe in this context I could lower BUFSIZE in
FastWriter, but that didn't seem to make any change :(

That said, I'm not sure it's worth looking into these buffers any
deeper, because shrinking them might negatively affect other results
(e.g. regular searches or facets). It sounds like the way forward
would be that manual flushing, with chunked encoding enabled. I could
imagine adding some parameters in the lines of "flush every N tuples
or M milliseconds", that would be computed per-request, or at least
globally to the /stream handler.

What do you think? Would such a patch be welcome, to add these
parameters? But it still requires chunked encoding - would reverting
SOLR-8669 be a problem? Or maybe there's a more elegant way to enable
chunked encoding, maybe only for streams?

Best regards,
Radu
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Mon, Jan 15, 2018 at 10:58 AM, Radu Gheorghe
 wrote:
> Hello fellow solr-users!
>
> Currently, if I do an HTTP request to receive some data via streaming
> expressions, like:
>
> curl --data-urlencode 'expr=search(test,
>q="foo_s:*",
>fl="foo_s",
>sort="foo_s asc",
>qt="/export")'
> http://localhost:8983/solr/test/stream
>
> I get all results at once. This is more obvious if I simply introduce
> a one-second sleep in CloudSolrStream: with three documents, the
> request takes about three seconds, and I get all three docs after
> three seconds.
>
> Instead, I would like to get documents in a more "streaming" way. For
> example, after X seconds give me what you already have. Or if an
> Y-sized buffer fills up, give me all the tuples you have, then resume.
>
> Any ideas/opinions in terms of how I could achieve this? With or
> without changing Solr's code?
>
> Here's what I have so far:
> - this is normal with non-chunked HTTP/1.1. You get all results at
> once. If I revert this patch[1] and get Solr to use chunked encoding,
> I get partial results every... what seems to be a certain size between
> 16KB and 32KB
> - I couldn't find a way to manually change this... what I assume is a
> buffer size, but failed so far. I've tried changing Jetty's
> response.setBufferSize() in HttpSolrCall (maybe the wrong place to do
> it?) and also tried changing the default 8KB buffer in FastWriter
> - manually flushing the writer (in JSONResponseWriter) gives the
> expected results (in combination with chunking)
>
> The thing is, even if I manage to change the buffer size, I assume
> that will apply to all requests (not just streaming expressions). I
> assume that ideally it would be configurable per request. As for
> manual flushing, that would require changes to the streaming
> expressions themselves. Would that be the way to go? What do you
> think?
>
> [1] https://issues.apache.org/jira/secure/attachment/12787283/SOLR-8669.patch
>
> Best regards,
> Radu
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/


Multi-core logging - can't tell which core?

2018-01-17 Thread Mark Sullivan
I am migrating a good number of cores over to the latest instance of solr 
(well, 7.1.0) installed locally.  It is working well, but my code is 
occasionally sending requests to search or index an old field that was replaced 
in the schema.


I see this in the logging, but I can't determine which core the log comes from. 
  How can I tell which core is receiving the offending requests?


Many thanks in advance!


Mark



Re: How to implement the function of W/N in Solr?

2018-01-17 Thread Emir Arnautović
Hi,
Can you share the output with debugQuery=true. Note that 3w means there are max 
2 words between those two phrases.

Regards,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 15 Jan 2018, at 09:04, xizhen.w...@incoshare.com wrote:
> 
> Hello,
> 
> I'm using Solr 4.10.3, and I want "A" and "B" are together, "C" and "D" are 
> together, and the terms "B" and "C" are no more than 3 terms away from each 
> other, by using {!surround} 3w("A B", "C D"), but it doesn't work.  Is there 
> any other useful way?
> 
> Any help is appreciated.
> 
> 
> 
> xizhen.w...@incoshare.com



Re: Solr 7 spatial search and WKT

2018-01-17 Thread Emir Arnautović
Hi Leila,
I haven’t been using spatial in a while and did not test this, but based on 
error, it seems that multivalue is not supported for this field type. Can you 
index a single MULTIPOLYGON? Why do you need to have multiple values? Can you 
flat your geometry to a single MULTIPOLYGON or MULTIGEOMETRY (if supported)? 
Can you explain why do you need to have multiValued field?

Thanks,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 17 Jan 2018, at 00:09, Leila Deljkovic  
> wrote:
> 
> Hi all,
> 
> I need to index multiple POLYGONS/MULTIPOLYGONS per document; I’m trying to 
> use multiValued RptWithGeometrySpatialField and I’m getting this error:
> 
> Exception writing document id leila_test to the index; possible analysis 
> error: DocValuesField "gridcell_rpt" appears more than once in this document 
> (only one value is allowed per field)
> 
> This is what I’m indexing:
> {
>   "id": "leila_test",
>   "gridcell_rpt": ["POLYGON((30 10, 10 20, 20 40, 40 40, 30 10))”, 
> "MULTIPOLYGON(((30 20, 45 40, 10 40, 30 20)), ((15 5, 40 10, 10 20, 5 10, 15 
> 5)))"]
> }
> 
> This is what’s in my schema.xml:
>   
>   …
>  
>   distanceUnits=”kilometers” autoIndex="true”/>
> 
> I’m pretty confused on why this isn’t working. I can’t find an example of 
> multiValued RptWithGeometrySpatialField anywhere -_-
> 
> Thanks :)