RptWithGeometrySpatialField

2015-11-22 Thread William Bell
David, et al.


2 questions.



Is this a good tradeoff for performance?

Also, RptWithGeometrySpatialField improved performance on what type of
searches? Does it support multivalued lat,lon fields?

Does LatLonType support multivalued lat,lon fields or only RPT allows for
this?


-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: Highlighting content field problem when using JiebaTokenizerFactory

2015-11-22 Thread Zheng Lin Edwin Yeo
I've tried to do some minor modification in the code under
JiebaSegmenter.java, and the highlighting seems to be fine now.

Basically, I created another int called offset2 under process() method.
int offset2 = 0;

Then I modified the offset to offset2 for this part of the code under
process() method.

if (sb.length() > 0)
if (mode == SegMode.SEARCH) {
for (Word token : sentenceProcess(sb.toString())) {
// tokens.add(new SegToken(token, offset, offset +=
token.length()));
tokens.add(new SegToken(token, offset2, offset2 +=
token.length())); // Change to offset2 by Edwin
}
} else {
for (Word token : sentenceProcess(sb.toString())) {
if (token.length() > 2) {
Word gram2;
int j = 0;
for (; j < token.length() - 1; ++j) {
gram2 = token.subSequence(j, j + 2);
if (wordDict.containsWord(gram2.getToken()))
// tokens.add(new SegToken(gram2, offset +
j, offset + j + 2));
tokens.add(new SegToken(gram2, offset2 + j,
offset2 + j + 2));  // Change to offset2 by Edwin
}
}
if (token.length() > 3) {
Word gram3;
int j = 0;
for (; j < token.length() - 2; ++j) {
gram3 = token.subSequence(j, j + 3);
if (wordDict.containsWord(gram3.getToken()))
// tokens.add(new SegToken(gram3, offset +
j, offset + j + 3));
tokens.add(new SegToken(gram3, offset2 + j,
offset2 + j + 3));  // Change to offset2 by Edwin
}
}
// tokens.add(new SegToken(token, offset, offset +=
token.length()));
tokens.add(new SegToken(token, offset2, offset2 +=
token.length()));// Change to offset2 by Edwin
}
}


Not sure if this is just a workaround, or can be used as a permanent
solution

Regards,
Edwin


On 28 October 2015 at 15:29, Zheng Lin Edwin Yeo 
wrote:

> Hi Scott,
>
> I have tried to edit the SegToken.java file in the jieba-analysis-1.0.0
> package with a +1 at both the startOffset and endOffset value (see code
> below), and now the  tag of the content is shifted to the correct place
> at the content. However, this means that in the title and other fields
> where the  tag is orignally at the correct place, they will get the 
> "org.apache.lucene.search.highlight.InvalidTokenOffsetsException"
> exception. I have temporary use another tokenizer for the other fields
> first.
>
> public SegToken(Word word, int startOffset, int endOffset) {
> this.word = word;
> this.startOffset = startOffset+1;
> this.endOffset = endOffset+1;
> }
>
> However, I don't think this can be a permanent solution, so I'm trying to
> zoom in further to the code, to see what's the difference with the content
> and other fields.
>
> I have also find that althought JiebaTokenizer works better for Chinese
> characters, it doesn't work well for English characters. For example, if I
> search for "water", the JiebaTokenizer will cut it as follow:
> w|at|er
> It can't cut it as a full word, which HMMChineseTokenizer is able to.
>
> Here's my configuration in schema.xml:
>
>  positionIncrementGap="100">
>  
>   segMode="SEARCH"/>
> 
> 
>  words="org/apache/lucene/analysis/cn/smart/stopwords.txt"/>
> 
>  maxGramSize="15"/>
>  
>  
>   segMode="SEARCH"/>
> 
> 
>  words="org/apache/lucene/analysis/cn/smart/stopwords.txt"/>
> 
>   
>   
>
> Does anyone knows if JiebaTokenizer is optimised to take in English
> characters as well?
>
> Regards,
> Edwin
>
>
> On 27 October 2015 at 15:57, Zheng Lin Edwin Yeo 
> wrote:
>
>> Hi Scott,
>>
>> Thank you for providing the links and references. Will look through them,
>> and let you know if I find any solutions or workaround.
>>
>> Regards,
>> Edwin
>>
>>
>> On 27 October 2015 at 11:13, Scott Chu  wrote:
>>
>>>
>>> Take a look at Michael's 2 articles, they might help you calrify the
>>> idea of highlighting in Solr:
>>>
>>> Changing Bits: Lucene's TokenStreams are actually graphs!
>>>
>>> http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html
>>>
>>> Also take a look at 4th paragraph In his another article:
>>>
>>> Changing Bits: A new Lucene highlighter is born
>>>
>>> http://blog.mikemccandless.com/2012/12/a-new-lucene-highlighter-is-born.html
>>>
>>> Currently, I can't figure out the possible cause of your problem unless
>>> I got spare time to test it on my own, which is not available these days
>>> (Got some projects to close)!
>>>
>>> If you find the solution or workaround, pls. let u

Re: geolocation search ignores distance parameter

2015-11-22 Thread PeterKerk
@Erik: thanks, overlooked that...added fq= before geofilt and now it works :)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/geolocation-search-ignores-distance-parameter-tp4241564p4241571.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: geolocation search ignores distance parameter

2015-11-22 Thread Shawn Heisey
On 11/22/2015 5:39 PM, PeterKerk wrote:
> Why is the result below returned even though I'm filtering in a radius of 20
> from geocoordinates defined in pt parameter in the querystring?
> As you can see the result in _dist_ in this result is is far larger than 20.
> 
> http://localhost:8983/solr/locs/select/?indent=on&facet=true{!geofilt}&pt=51.98,5.9&sfield=geolocation&d=20&sort=geodist()%20asc&q=*:*&start=0&rows=12&fl=id,_dist_:geodist(),lat,lng

Disclaimer:  I've never used any of the spatial features in Solr.  I
have no idea whether I've got right or wrong info here.

I did some digging in the documentation, and based on what I found, your
geofilt syntax looks wrong.  I'm reasonably sure that localparams like
{!geofilt} must be at the beginning of the main query or a filter query
-- they cannot stand alone, and attaching it to the facet parameter
seems incorrect.  Adjusting your parameter list so it looks right, I get
the following:

indent=on
facet=true
fq={!geofilt sfield=geolocation}
pt=51.98,5.9
d=20
sort=geodist() asc
q=*:*
start=0
rows=12
fl=id,_dist_:geodist(),lat,lng

Thanks,
Shawn



Re: geolocation search ignores distance parameter

2015-11-22 Thread Erik Hatcher
Looks like your query doesn't actually make geofilt be an actual fq parameter. 

> On Nov 22, 2015, at 19:39, PeterKerk  wrote:
> 
> Why is the result below returned even though I'm filtering in a radius of 20
> from geocoordinates defined in pt parameter in the querystring?
> As you can see the result in _dist_ in this result is is far larger than 20.
> 
> http://localhost:8983/solr/locs/select/?indent=on&facet=true{!geofilt}&pt=51.98,5.9&sfield=geolocation&d=20&sort=geodist()%20asc&q=*:*&start=0&rows=12&fl=id,_dist_:geodist(),lat,lng
> 
> 
>
>4.20579929967
>1803
>51.5320753
>127.50432946951436
>
> 
> 
> schema.xml definitions
> 
> subFieldSuffix="_coordinate"/>
>
>
>
>
>
> 
> I tried adding this to the query string: &fq=_dist_:10
> 
> but then I get the error: undefined field _dist_
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/geolocation-search-ignores-distance-parameter-tp4241564.html
> Sent from the Solr - User mailing list archive at Nabble.com.


geolocation search ignores distance parameter

2015-11-22 Thread PeterKerk
Why is the result below returned even though I'm filtering in a radius of 20
from geocoordinates defined in pt parameter in the querystring?
As you can see the result in _dist_ in this result is is far larger than 20.

http://localhost:8983/solr/locs/select/?indent=on&facet=true{!geofilt}&pt=51.98,5.9&sfield=geolocation&d=20&sort=geodist()%20asc&q=*:*&start=0&rows=12&fl=id,_dist_:geodist(),lat,lng



4.20579929967
1803
51.5320753
127.50432946951436

   

schema.xml definitions

  






I tried adding this to the query string: &fq=_dist_:10

but then I get the error: undefined field _dist_




--
View this message in context: 
http://lucene.472066.n3.nabble.com/geolocation-search-ignores-distance-parameter-tp4241564.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Json Facet api on nested doc

2015-11-22 Thread Yonik Seeley
On Sun, Nov 22, 2015 at 3:10 PM, Mikhail Khludnev
 wrote:
> Hello,
>
> I also played with json.facet, but couldn't achieve the desired result too.
>
> Yonik, Alessandro,
> Do you think it's a new feature or it can be achieved with the current
> implementation?

Not sure if I'm misunderstanding the example, but it looks straight-forward.

terms facet on parent documents, with sub-facet on child documents.
I just committed a test for this, and it worked fine.  See
TestJsonFacets.testBlockJoin()

Can we see an example of a parent document being indexed (i.e. along
with it's child documents)?

-Yonik


Re: Security Problems

2015-11-22 Thread Don Bosco Durai
>You seem to be suggesting that the UI be broken down into components that can 
>be authorised independently.
Yes, this is what I was mostly concerned of. It doesn’t apply much with today’s 
Admin UI though...


>For myself, right now, I'm just keen to see that if authentication is required 
>for Solr, then authentication will be required for the UI too.
I agree with you, if it is just authentication for the current admin UI and if 
authentication is enabled, then we should just authenticate all requests. 
Because in HTTP, authentication is just done once and the session cookie is 
used for subsequent requests. So this is an extremely low (no cost) operation.

When I tested Solr with Kerberos, that is how it was already working. If you 
don’t have not done “knit” on your local machine, you can’t access any page. If 
you are done kinit, then you can access the pages, but most of the stuff will 
not work, because you don’t make the call to get the required data. So, if you 
can the person the permission for action “sol_admin”, then everything should 
work.

I have not tested BASIC auth implementation yet. So can’t speak for it. But the 
underlying design/implementation should be the same. Unless something changed 
from when I originally tested just before 5.2 was released.

Bosco



On 11/22/15, 12:45 PM, "Upayavira"  wrote:

>Don,
>
>You seem to be suggesting that the UI be broken down into components
>that can be authorised independently. For example, a user who is allowed
>to query, but not to update, should not have access to UI elements (such
>as documents in its current incarnation) that allow updating. This is
>taking the UI further than I had considered, in this regard.
>
>For myself, right now, I'm just keen to see that if authentication is
>required for Solr, then authentication will be required for the UI too.
>
>My suggestion to put the UI within a request handler or such was to
>facilitate the authentication framework interacting with it, e.g. having
>an admin-ui (or just a UI) role covering the whole UI. More
>sophisticated UIs are, for sure, possible, but I for one haven't thought
>that far yet.
>
>Upayavira
>
>On Sat, Nov 21, 2015, at 09:42 PM, Don Bosco Durai wrote:
>> In traditional web interface application, the URLs can be configured as
>> public->authenticated->authorized. Which is very similar to what you are
>> suggesting. 
>> 
>> >I tried out BasicAuthPlugin today. Surprised that not admin UI is protected.
>> My suggestion would be to differentiate between Web Interface and HTTP
>> API interfaces. Because trying to solve both using the same design will
>> not be very easy and even if you do it, it will be management nightmare.
>> 
>> 
>> 
>> >I'm very happy for the admin UI to be served another way - i.e. Not direct 
>> >from Jetty, if that makes the task of securing it easier.
>> 
>> Do you see a richer Solr UI? My understanding on talking with Anshum
>> (offline) was that the Solr Admin UI was only for Admin Users. So
>> technically you need to have “all” permission to access the Solr Admin
>> UI. Which I think is a fair point.
>> 
>> If we are planning to give an alternate UI. Then your point is valid and
>> it gives more framework options to choose from. Traditional web
>> interfaces mostly deal with static pages and servlet/REST requests to get
>> data from the server. Frameworks like Spring give a lot of control how
>> you want to configure these URLs. I know we are trying to get out of
>> Jetty and Tomcat. But frameworks like Spring make life a lot easier when
>> you are dealing with Web Interfaces. It can also support different
>> authentication schemes for WebUI. I feel, instead of reinventing the
>> wheel here, we should consider some framework which will give Web
>> Interface level access control and authentication.
>> 
>> 
>> For the APIs (/select, /query, etc…), I feel, the current design is
>> pretty flexible. Of course, we should make sure all access path are
>> controlled and easy to manage.
>> 
>> >Take the well-known permission “read” for instance. It protects /select and 
>> >/get. But it won’t protect /query, /browse, /export, /spell, /suggest, 
>> >/tvrh, /terms, /clustering or /elevate, all which also expose sensitive 
>> >info.
>> 
>> Apache Ranger does one of the implementation for the Solr authorizer
>> interface. In 5.2, we supported the following actions:
>> public static final String ACCESS_TYPE_CREATE = "create";
>>  public static final String ACCESS_TYPE_UPDATE = "update";
>>  public static final String ACCESS_TYPE_QUERY = "query";
>>  public static final String ACCESS_TYPE_OTHERS = "others";
>>  public static final String ACCESS_TYPE_ADMIN = "solr_admin";
>> 
>> My assumptions was that the other actions will map to one of these. E.g.
>> /suggest will map to “query” and “/export” will map to “solr_admin”. My
>> assumptions might be incorrect. But we mapped all unknown/non-standard
>> actions to “others”. I will review the code to see if I 

Re: Security Problems

2015-11-22 Thread Upayavira
Don,

You seem to be suggesting that the UI be broken down into components
that can be authorised independently. For example, a user who is allowed
to query, but not to update, should not have access to UI elements (such
as documents in its current incarnation) that allow updating. This is
taking the UI further than I had considered, in this regard.

For myself, right now, I'm just keen to see that if authentication is
required for Solr, then authentication will be required for the UI too.

My suggestion to put the UI within a request handler or such was to
facilitate the authentication framework interacting with it, e.g. having
an admin-ui (or just a UI) role covering the whole UI. More
sophisticated UIs are, for sure, possible, but I for one haven't thought
that far yet.

Upayavira

On Sat, Nov 21, 2015, at 09:42 PM, Don Bosco Durai wrote:
> In traditional web interface application, the URLs can be configured as
> public->authenticated->authorized. Which is very similar to what you are
> suggesting. 
> 
> >I tried out BasicAuthPlugin today. Surprised that not admin UI is protected.
> My suggestion would be to differentiate between Web Interface and HTTP
> API interfaces. Because trying to solve both using the same design will
> not be very easy and even if you do it, it will be management nightmare.
> 
> 
> 
> >I'm very happy for the admin UI to be served another way - i.e. Not direct 
> >from Jetty, if that makes the task of securing it easier.
> 
> Do you see a richer Solr UI? My understanding on talking with Anshum
> (offline) was that the Solr Admin UI was only for Admin Users. So
> technically you need to have “all” permission to access the Solr Admin
> UI. Which I think is a fair point.
> 
> If we are planning to give an alternate UI. Then your point is valid and
> it gives more framework options to choose from. Traditional web
> interfaces mostly deal with static pages and servlet/REST requests to get
> data from the server. Frameworks like Spring give a lot of control how
> you want to configure these URLs. I know we are trying to get out of
> Jetty and Tomcat. But frameworks like Spring make life a lot easier when
> you are dealing with Web Interfaces. It can also support different
> authentication schemes for WebUI. I feel, instead of reinventing the
> wheel here, we should consider some framework which will give Web
> Interface level access control and authentication.
> 
> 
> For the APIs (/select, /query, etc…), I feel, the current design is
> pretty flexible. Of course, we should make sure all access path are
> controlled and easy to manage.
> 
> >Take the well-known permission “read” for instance. It protects /select and 
> >/get. But it won’t protect /query, /browse, /export, /spell, /suggest, 
> >/tvrh, /terms, /clustering or /elevate, all which also expose sensitive info.
> 
> Apache Ranger does one of the implementation for the Solr authorizer
> interface. In 5.2, we supported the following actions:
> public static final String ACCESS_TYPE_CREATE = "create";
>   public static final String ACCESS_TYPE_UPDATE = "update";
>   public static final String ACCESS_TYPE_QUERY = "query";
>   public static final String ACCESS_TYPE_OTHERS = "others";
>   public static final String ACCESS_TYPE_ADMIN = "solr_admin";
> 
> My assumptions was that the other actions will map to one of these. E.g.
> /suggest will map to “query” and “/export” will map to “solr_admin”. My
> assumptions might be incorrect. But we mapped all unknown/non-standard
> actions to “others”. I will review the code to see if I missed anything.
> 
> 
> Bosco
> 
> 
> 
> 
> On 11/20/15, 2:27 PM, "Jan Høydahl"  wrote:
> 
> >> ideally we should have a simple permission name called "all" (which we
> >> don't have)
> >> 
> >> so that one rule should be enough
> >> 
> >> "name":"all",
> >> "role":"somerole"
> >> 
> >> Open a ticket and we should fix it for 5.4.0
> >> It should also include  the admin paths as well
> >
> >Yes, that would be convenient.
> >
> >I still don’t like the existing "open-by-default” security mode of Solr. It 
> >is very fragile to mis-configuration without people noticing. Take the 
> >well-known permission “read” for instance. It protects /select and /get. But 
> >it won’t protect /query, /browse, /export, /spell, /suggest, /tvrh, /terms, 
> >/clustering or /elevate, all which also expose sensitive info.
> >
> >How about allowing to choose between three different security modes?
> >
> >-Dsolr.security.mode=open  : As today - paths not configured are 
> >wide open
> >-Dsolr.security.mode=authenticated : Paths not configured are open to any 
> >authenticated user
> >-Dsolr.security.mode=explicit  : Paths not configured are closed to all. 
> >All acccess is explicitly configured
> >
> >/Jan
> 


Re: Json Facet api on nested doc

2015-11-22 Thread Mikhail Khludnev
Hello,

I also played with json.facet, but couldn't achieve the desired result too.

Yonik, Alessandro,
Do you think it's a new feature or it can be achieved with the current
implementation?

On Thu, Nov 19, 2015 at 2:50 PM, xavi jmlucjav  wrote:

> Hi,
>
> I am trying to get some faceting with the json facet api on nested doc, but
> I am having issues. Solr 5.3.1.
>
> This query gest the buckets numbers ok:
>
> curl http://shost:8983/solr/collection1/query -d 'q=*:*&rows=0&
>  json.facet={
>yearly-salaries : {
> type: terms,
> field: salary,
> domain: { blockChildren : "parent:true" }
>   }
>  }
> '
> Salary is a field in child docs only. But if I add another facet outside
> it, the inner one returns no data:
>
> curl http://shost:8983/solr/collection1/query -d 'q=*:*&rows=0&
>  json.facet={
> department:{
>type: terms,
>field: department,
>facet:{
>yearly-salaries : {
> type: terms,
> field: salary,
> domain: { blockChildren : "parent:true" }
>   }
>   }
>   }
>  }
> '
> Results in:
>
> "facets":{
>
>  "count":3144071,
>
> "department":{
>
> "buckets":[{
>
> "val":"Development",
>
> "count":85707,
>
> "yearly-salaries":{
>
> "buckets":[]}},
>
>
> department is field only in parent docs. Am I doing something wrong that I
> am missing?
> thanks
> xavi
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: Data Import Handler / Backup indexes

2015-11-22 Thread Erick Erickson
These are just Lucene indexes. There's the Cloud backup and restore
that is being worked on.

But if the index is static (i.e. not being indexed to), simply copying
the data/index (well, actually the whole data index and subdirs)
directory will backup and restore it. Copying the index directory back
(I'd have Solr shut down when copying back) would restore the index.

Best,
Erick

On Sat, Nov 21, 2015 at 10:12 PM, Brian Narsi  wrote:
> What are the caveats regarding the copy of a collection?
>
> At this time DIH takes only about 10 minutes. So in case of accidental
> delete we can just re-run the DIH. The reason I am thinking about backup is
> just in case records are deleted accidentally and the DIH cannot be run
> because the database is unavailable.
>
> Our collection is simple: 2 nodes - 1 collection - 2 shards with 2 replicas
> each
>
> So a simple copy (cp command) for both the nodes/shards might work for us?
> How do I restore the data back?
>
>
>
> On Tue, Nov 17, 2015 at 4:56 PM, Jeff Wartes  wrote:
>
>>
>> https://github.com/whitepages/solrcloud_manager supports 5.x, and I added
>> some backup/restore functionality similar to SOLR-5750 in the last
>> release.
>> Like SOLR-5750, this backup strategy requires a shared filesystem, but
>> note that unlike SOLR-5750, I haven’t yet added any backup functionality
>> for the contents of ZK. I’m currently working on some parts of that.
>>
>>
>> Making a copy of a collection is supported too, with some caveats.
>>
>>
>> On 11/17/15, 10:20 AM, "Brian Narsi"  wrote:
>>
>> >Sorry I forgot to mention that we are using SolrCloud 5.1.0.
>> >
>> >
>> >
>> >On Tue, Nov 17, 2015 at 12:09 PM, KNitin  wrote:
>> >
>> >> afaik Data import handler does not offer backups. You can try using the
>> >> replication handler to backup data as you wish to any custom end point.
>> >>
>> >> You can also try out : https://github.com/bloomreach/solrcloud-haft.
>> >>This
>> >> helps backup solr indices across clusters.
>> >>
>> >> On Tue, Nov 17, 2015 at 7:08 AM, Brian Narsi 
>> wrote:
>> >>
>> >> > I am using Data Import Handler to retrieve data from a database with
>> >> >
>> >> > full-import, clean = true, commit = true and optimize = true
>> >> >
>> >> > This has always worked correctly without any errors.
>> >> >
>> >> > But just to be on the safe side, I am thinking that we should do a
>> >>backup
>> >> > before initiating Data Import Handler. And just in case something
>> >>happens
>> >> > restore the backup.
>> >> >
>> >> > Can backup be done automatically (before initiating Data Import
>> >>Handler)?
>> >> >
>> >> > Thanks
>> >> >
>> >>
>>
>>