RE: [E] Re: Questions about Disk space Usage

2016-10-31 Thread Jamal, Sarfaraz
Thank you all for your comments and help -

I kept the last days' worth of files form the /tmp folder and removed the rest 
- without any problems or difficulties.

Sas

-Original Message-
From: Walter underwood [mailto:wun...@wunderwood.org] 
Sent: Saturday, October 29, 2016 1:10 PM
To: solr-user@lucene.apache.org
Subject: [E] Re: Questions about Disk space Usage

If it works the way I think it does, an empty segment should take the same 
amount of time to read in as a full segment, but zero time to write out.

wunder 

> On Oct 29, 2016, at 9:21 AM, Erick Erickson  wrote:
> 
> I would also expect a totally empty segment to be merged very quickly 
> as the percent deleted documents weighs heavily when determining 
> whether to merge a segment but that's based on principle, not deep 
> code knowledge.
> 
> Best,
> Erick
> 
>> On Fri, Oct 28, 2016 at 6:02 PM, Walter Underwood  
>> wrote:
>> After the merge. That is what merges do, clean up segments.
>> 
>> I expect it is very rare for a segment to be 100% deleted docs, so it 
>> isn’t worth handling that case.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>> 
>>> On Oct 28, 2016, at 5:54 PM, Alexandre Rafalovitch  
>>> wrote:
>>> 
>>> Don't the segment that only has deleted documents just gets dropped?
>>> Or does it get dropped _after_ the merge and therefore still sits 
>>> around?
>>> 
>>> Regards,
>>>  Alex.
>>> 
>>> Solr Example reading group is starting November 2016, join us at 
>>> http://j.mp/SolrERG Newsletter and resources for Solr beginners and 
>>> intermediates:
>>> http://www.solr-start.com/
>>> 
>>> 
>>>> On 29 October 2016 at 08:53, Walter Underwood  
>>>> wrote:
>>>> It is normal for disk usage to double. Under controlled 
>>>> circumstances, it can triple, but that probably won’t happen.
>>>> 
>>>> This is the second time today that I’ve sent this information to the list.
>>>> 
>>>> It can use nearly 2X the space whenever the largest segment(s) are 
>>>> merged, especially if there are only a few smaller segments.
>>>> 
>>>> In order to use 3X the space, you need to:
>>>> 
>>>> 1. Disable merging.
>>>> 2. Delete all the documents.
>>>> 3. Add all the documents.
>>>> 4. Enable merging.
>>>> 
>>>> This causes one complete set of segments that are 100% deletes, one 
>>>> set that is 0% deletes, then the merge creates another set that is 
>>>> 0% deletes. During the merge, the old files remain while the new 
>>>> one is created.
>>>> 
>>>> wunder
>>>> Walter Underwood
>>>> wun...@wunderwood.org
>>>> http://observer.wunderwood.org/  (my blog)
>>>> 
>>>> 
>>>>> On Oct 28, 2016, at 2:41 PM, Alexandre Rafalovitch  
>>>>> wrote:
>>>>> 
>>>>> 2) Is probably a merge operation. Lucene index segments are not 
>>>>> rewritable in place, so the merge creates a new file, does 
>>>>> everything to it, then switches to it.
>>>>> 
>>>>> I remember the number was that the space could temporarily triple
>>>>> (?!?) though that may have been before the tiered merge policy.
>>>>> 
>>>>> 3) It should be safe to delete old log files. It is standard log4j stuff.
>>>>> 
>>>>> 
>>>>> Solr Example reading group is starting November 2016, join us at 
>>>>> http://j.mp/SolrERG Newsletter and resources for Solr beginners 
>>>>> and intermediates:
>>>>> http://www.solr-start.com/
>>>>> 
>>>>> 
>>>>> On 29 October 2016 at 06:55, Jamal, Sarfaraz 
>>>>>  wrote:
>>>>>> Hi Guys,
>>>>>> 
>>>>>> I am currently investigating an instance of Solr's Disk space usage and 
>>>>>> I had a few questions I thought you guys might be able to help answer.
>>>>>> 
>>>>>> First Question
>>>>>> * There is 30 gb's worth of autosuggest data in the /tmp folder. 
>>>>>> Each file is half of a gigabyte Is it safe to delete those files?
>>>>>> 
>>>>>> Second Question
>>>>>> Also, we notice that at times the disk runs down to only having a few 
>>>>>> gigabytes available, and then goes back to having more space. (the index 
>>>>>> file literally grows and then shrinks).
>>>>>> 
>>>>>> Third Question
>>>>>> Is it also safe to delete the log files?
>>>>>> 
>>>>>> We run a database indexer on a set interval, perhaps that is relevant to 
>>>>>> this discussion.
>>>>>> 
>>>>>> Sas
>>>> 
>> 



Questions about Disk space Usage

2016-10-28 Thread Jamal, Sarfaraz
Hi Guys,

I am currently investigating an instance of Solr's Disk space usage and I had a 
few questions I thought you guys might be able to help answer.

First Question
* There is 30 gb's worth of autosuggest data in the /tmp folder. Each file is 
half of a gigabyte
Is it safe to delete those files?

Second Question
Also, we notice that at times the disk runs down to only having a few gigabytes 
available, and then goes back to having more space. (the index file literally 
grows and then shrinks).

Third Question
Is it also safe to delete the log files?

We run a database indexer on a set interval, perhaps that is relevant to this 
discussion.

Sas 


RE: Question about Simple Post tool

2016-08-01 Thread Jamal, Sarfaraz
Thank you.

That is a great suggestion -

Sas



-Original Message-
From: Scott Chu [mailto:scott@udngroup.com] 
Sent: Monday, August 1, 2016 10:21 AM
To: solr-user 
Subject: Re: Question about Simple Post tool


I don't think it's possible purely using the out-of-box post.jar. But why not 
disassemble post.jar (or get the source from internet) and modify it yourself. 
It seems not that hard.

Scott Chu,scott@udngroup.com
2016/8/1 (週一)
- Original Message ----- 
From: Jamal, Sarfaraz 
To: solr-user 
CC: 
Date: 2016/8/1 (週一) 22:05
Subject: Question about Simple Post tool


Hi Guys, 

I have a quick question. 

I read the appropriate documentation and it seems that it is possible, but I 
might be getting the syntax wrong. 

I wish to use the simple Post Tool to pass in a URL that brings back a word 
document, and I Want to index the return of that url using TIka - 

Is that possible? Or do I have to get the file onto my file system first? 

Thanks, 

Sas 


- 
未在此訊息中找到病毒。 
已透過 AVG 檢查 - www.avg.com 
版本: 2015.0.6201 / 病毒庫: 4627/12724 - 發佈日期: 08/01/16


Question about Simple Post tool

2016-08-01 Thread Jamal, Sarfaraz
Hi Guys,

I have a quick question.

I read the appropriate documentation and it seems that it is possible, but I 
might be getting the syntax wrong.

I wish to use the simple Post Tool to pass in a URL that brings back a word 
document, and I Want to index the return of that url using TIka -

Is that possible? Or do I have to get the file onto my file system first?

Thanks,

Sas


RE: search documents that have a specific field populated

2016-07-15 Thread Jamal, Sarfaraz
If I understand you properly, I do it using a Filter Query:

fq=NOT(field:EMPTY)

Hope that helps -

Sas


-Original Message-
From: Valentina Cavazza [mailto:valent...@step-net.it] 
Sent: Friday, July 15, 2016 10:17 AM
To: solr-user@lucene.apache.org
Subject: search documents that have a specific field populated

Hi,
I need to search documents that have a specific field populated, so I want to 
display all the documents that have the field not empty.
This field in schema is set multivalued=true, indexed=true, stored=true, 
default=EMPTY.
This field type is solr.TextField class, use StandardTokenizerFactory 
tokenizer, ICUFoldingFilterFactory filter, LowerCaseFilterFactory filter and 
GreekStemFilterFactory filter in index and query analizer.
I already tried queries like this:
q=field:*
q=+field:*
q=+field:[* TO *]
q=+field:['' TO *]
q=+field:["" TO *]
q=+field:[' ' TO *]
q=+field:' '
q=-field:EMPTY

but nothing found.
Someone know how to do that?
Thanks

Valentina


RE: Simple Post Tool result question (UNCLASSIFIED)

2016-07-14 Thread Jamal, Sarfaraz
I am not entirely sure what you mean,

But extra slashes in the middle of a url produce the same result as a single 
slash (right?).

So for example:
https://www.visualstudio.com/downloads///download-visual-studio-vs

is the same as:
https://www.visualstudio.com/downloads/download-visual-studio-vs


-Original Message-
From: Musshorn, Kris T CTR USARMY RDECOM ARL (US) 
[mailto:kris.t.musshorn@mail.mil] 
Sent: Thursday, July 14, 2016 2:09 PM
To: solr-user@lucene.apache.org
Subject: Simple Post Tool result question (UNCLASSIFIED)

CLASSIFICATION: UNCLASSIFIED

POSTed web resource https://xx/inside/news/dispatches///view.cfm?id=9128 
(depth: 4)

What is the significance of the /// ?

Thanks,
Kris

~~
Kris T. Musshorn
FileMaker Developer - Contractor - Catapult Technology Inc.  
US Army Research Lab 
Aberdeen Proving Ground 
Application Management & Development Branch 
410-278-7251
kris.t.musshorn@mail.mil
~~



CLASSIFICATION: UNCLASSIFIED


RE: SimplePost tool (UNCLASSIFIED)

2016-07-14 Thread Jamal, Sarfaraz
In my experience - and if I recall correctly -

If the ids are different but the file Is the same, you will have two separate 
documents that are indexed -

Sas



Sarfaraz Jamal (Sas)
Revenue Assurance Tech Ops
614-560-8556
sarfaraz.ja...@verizonwireless.com


-Original Message-
From: Musshorn, Kris T CTR USARMY RDECOM ARL (US) 
[mailto:kris.t.musshorn@mail.mil] 
Sent: Thursday, July 14, 2016 12:37 PM
To: solr-user@lucene.apache.org
Subject: SimplePost tool (UNCLASSIFIED)

CLASSIFICATION: UNCLASSIFIED

Does the simple post tool accomplish deduplication?

Thanks,
Kris

~~
Kris T. Musshorn
FileMaker Developer - Contractor - Catapult Technology Inc.  
US Army Research Lab 
Aberdeen Proving Ground 
Application Management & Development Branch 
410-278-7251
kris.t.musshorn@mail.mil
~~



CLASSIFICATION: UNCLASSIFIED


RE: Update index

2016-07-13 Thread Jamal, Sarfaraz
Hi Kostali,

I would look at the Delta Queries -

Sas

-Original Message-
From: kostali hassan [mailto:med.has.kost...@gmail.com] 
Sent: Wednesday, July 13, 2016 5:17 AM
To: solr-user@lucene.apache.org
Subject: Update index

I am using solr 5.4 1 to index sql database with data import handler.
I am looking for update index automatically when the database is modified or 
insert in it new value.


RE: Searching Home's, Homes and Home

2016-07-08 Thread Jamal, Sarfaraz
I would start by looking at the stemming documentation -

It might be of help.

Sas


-Original Message-
From: Surender [mailto:surender.si...@rsystems.com] 
Sent: Friday, July 8, 2016 8:30 AM
To: solr-user@lucene.apache.org
Subject: Searching Home's, Homes and Home

User can type keyword for search in many ways an and following are the few
examples:
if user types any of  the keywords homes, home, home's then it should be able 
to search the following:
1. Home
2. Home's
3. Homes

If user types Americas, the results should include 1. Americas 2. America's 3. 
America

Please suggest how to send the search query to Solr to include all the results.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Searching-Home-s-Homes-and-Home-tp4286341.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: SOLR 6: edismax search query with OR operator does not work as expected

2016-07-08 Thread Jamal, Sarfaraz
This sounds like it might be of help -

< solrQueryParser defaultOperator="AND"/>

You can change it from and to or.

(If I understood you) -

Sas


-Original Message-
From: Aleš Gregor [mailto:alg...@gmail.com] 
Sent: Friday, July 8, 2016 9:37 AM
To: solr-user@lucene.apache.org
Subject: SOLR 6: edismax search query with OR operator does not work as expected

Hello,

after migrating my index from Solr 4.3 to Solr 6 I noticed that the OR logical 
operator in search query no longer works as expected.

On Solr 4.3 query - Blue OR Red - brings all documents with Blue or Red or both 
tokens found.
On Solr 6 the same query only brings documents with both the tokens, Blue and 
Red.

I see some difference in the debug of the query but I cannot  make much sense 
out of it.

Was there any change between Solr 4 and 6 that would cause this?

Thanks
Ales Gregor


RE: Some questions

2016-07-07 Thread Jamal, Sarfaraz
Of course, yes -=)

Sas



Sarfaraz Jamal (Sas)
Revenue Assurance Tech Ops
614-560-8556
sarfaraz.ja...@verizonwireless.com

-Original Message-
From: Siwei Lv [mailto:si...@microsoft.com] 
Sent: Thursday, July 7, 2016 4:40 AM
To: solr-user@lucene.apache.org
Subject: Some questions

Hi all,

I have some questions about solr, Can I send them to this mail box?

Thanks,
Siwei


RE: Solr more like this

2016-07-06 Thread Jamal, Sarfaraz
Could you index it, do the 'like this' and  then delete it from the index?

All in one smooth user experience obviously.

(Just throwing it out there).

Sas



-Original Message-
From: Charlie Hull [mailto:char...@flax.co.uk] 
Sent: Wednesday, July 6, 2016 11:02 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr more like this

On 05/07/2016 19:42, sara hajili wrote:
> Hi
> I indexed pdf files yo solr.and now I wanna to know is there any way 
> to uplaod  a pdf file and solr return related pdf in result?
> I mean I don't want to index pdf file (the file that I wanna to get 
> pdf more like this for this pdf).and just upload pdf file and get mlt 
> result.can I do this??
>
If Solr hasn't indexed a PDF file, it can't work out it's 'like this'. 
So I'd say, no, you can't.

Cheers

Charlie

--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk


Question about Indexing Updated Documents

2016-07-01 Thread Jamal, Sarfaraz
Hi Guys,

I have a data-import handler set up that indexes all of the documents from a 
few small tables.

What is the best way to update the index when a single one of those documents 
change.

Is it possible to use SQL or must I post json or xml to solr?

Thanks you,

Sas


Thank You Guys

2016-06-17 Thread Jamal, Sarfaraz
Hi Guys,

Thank you all - I got synonyms, highlighting, stemming all working the way I 
wanted to.

I am sure I will have more questions later on =)

Thanks!

Sas


RE: [E] Re: Stemming

2016-06-16 Thread Jamal, Sarfaraz
Oh, is this what you meant?

  

  content_stemming
  

  

I changed it to content_stemming and now it seems to work :) - It was _text_ 
before -

Thanks! I will update if I discover anything amiss

Thanks again so much =)

Sas

-Original Message-
From: Aurélien MAZOYER [mailto:aurelien.mazo...@francelabs.com] 
Sent: Thursday, June 16, 2016 4:36 PM
To: solr-user@lucene.apache.org
Subject: Re: [E] Re: Stemming

Hi,

I was just wondering if you are sure that you query only that field (or fields 
that use your text_stem analyzer) and not other fields (in your qf for example 
is you use edismax) that can give you uncorrect results.

Regards,

Aurélien

Le 16/06/2016 22:29, Jamal, Sarfaraz a écrit :
> Hello =)
>
> Just to be safe and make sure it's happening at indexing time AS WELL 
> as QUERYING time -
>
> I modified it to be like so:
>
>
>   
> 
>  words="lang/stopwords_en.txt" ignoreCase="true"/>
> 
> 
>  protected="protwords.txt"/>
> 
>   
>   
> 
>  words="lang/stopwords_en.txt" ignoreCase="true"/>
> 
> 
>  protected="protwords.txt"/>
> 
>
>
>
> I am re-indexing the files
> And what do you mean about only querying one field? I am not entirely sure I 
> understand..
>
> Sas
>
> -Original Message-
> From: Aurélien MAZOYER [mailto:aurelien.mazo...@francelabs.com]
> Sent: Thursday, June 16, 2016 4:20 PM
> To: solr-user@lucene.apache.org
> Subject: [E] Re: Stemming
>
> Hi,
>
> Yes you should have the same resultset.
>
> Are you sure that you reindex all the data after changing your schema?
> Are you sure that you put your analyzer both at indexing and querying?
> Are you sure you query only one field?
>
> Regards,
>
> Aurélien
>
> Le 16/06/2016 21:13, Jamal, Sarfaraz a écrit :
>> Hi Guys,
>>
>> I have enabled stemming:
>> 
>>  
>>  
>>  > language="English"/>
>>  
>> 
>>
>> In the Admin Analysis, I type in running or runs and they both break down to 
>> run.
>> However when I search for run, runs, or running with an actual query 
>> -
>>
>> It brings back three different sets of results.
>>
>> Is that correct?
>>
>> I would imagine that all three would bring back the exact same resultset?
>>
>> Sas
>>



RE: [E] Re: Stemming

2016-06-16 Thread Jamal, Sarfaraz
Hello =)

Just to be safe and make sure it's happening at indexing time AS WELL as 
QUERYING time -

I modified it to be like so:

  

  
  
  
  
  
  


  
  
  
  
  
  
 
  

I am re-indexing the files
And what do you mean about only querying one field? I am not entirely sure I 
understand..

Sas

-Original Message-
From: Aurélien MAZOYER [mailto:aurelien.mazo...@francelabs.com] 
Sent: Thursday, June 16, 2016 4:20 PM
To: solr-user@lucene.apache.org
Subject: [E] Re: Stemming

Hi,

Yes you should have the same resultset.

Are you sure that you reindex all the data after changing your schema?
Are you sure that you put your analyzer both at indexing and querying?
Are you sure you query only one field?

Regards,

Aurélien

Le 16/06/2016 21:13, Jamal, Sarfaraz a écrit :
> Hi Guys,
>
> I have enabled stemming:
>
>   
>   
>language="English"/>
>   
>
>
> In the Admin Analysis, I type in running or runs and they both break down to 
> run.
> However when I search for run, runs, or running with an actual query -
>
> It brings back three different sets of results.
>
> Is that correct?
>
> I would imagine that all three would bring back the exact same resultset?
>
> Sas
>



RE: [E] Re: Stemming

2016-06-16 Thread Jamal, Sarfaraz
HI Ahmet,

Thanks for your guidance.

I just tried the following two configurations:

  





  

And

  

  
  
  
  
  
  

  

They both produced three different sets of results

-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID] 
Sent: Thursday, June 16, 2016 3:37 PM
To: solr-user@lucene.apache.org
Subject: [E] Re: Stemming



Hi Jamal,

Snowball requires lowercase filter above it.
This is documented in javadocs but it is a small but important detail.
Please use a lowercase filter after the whitescpace tokenizer.


Ahmet
On Thursday, June 16, 2016 10:13 PM, "Jamal, Sarfaraz" 
 wrote:



Hi Guys,

I have enabled stemming:
  




  

In the Admin Analysis, I type in running or runs and they both break down to 
run.
However when I search for run, runs, or running with an actual query -

It brings back three different sets of results.

Is that correct?

I would imagine that all three would bring back the exact same resultset?

Sas 


Stemming

2016-06-16 Thread Jamal, Sarfaraz
Hi Guys,

I have enabled stemming:
  




  

In the Admin Analysis, I type in running or runs and they both break down to 
run.
However when I search for run, runs, or running with an actual query -

It brings back three different sets of results.

Is that correct?

I would imagine that all three would bring back the exact same resultset?

Sas 



RE: [E] Re: Question(s) about Highlighting

2016-06-15 Thread Jamal, Sarfaraz
Update on this:

I feel I have a good grasp of synonyms:

In that I am doing it only at query time and not at indexing time

It looks like this in Synonyms.txt
sarfaraz jamal,sasjamal, sas,sarfaraz,wiggidy

Each one of those bring back the exact same records.

However it only highlights Jamal (with a space in front of it) 

Is there a way I can get the highlight snippets for each of the 4 synonyms of 
each other?

Thank you !

Sas


-Original Message-
From: Jamal, Sarfaraz [mailto:sarfaraz.ja...@verizonwireless.com.INVALID] 
Sent: Friday, June 3, 2016 9:52 AM
To: solr-user@lucene.apache.org
Subject: RE: [E] Re: Question(s) about Highlighting

Good Morning Alessandro,

I verified it through the analysis tool (thanks for pointing it out), and it 
appears to be working correctly - As I see all of them as being synonyms of 
each other for this entry:

sasjamal, sarfaraz, sas

- When I do it only at indexing time, and disable it during query time (editing 
the synonyms.txt file SOLR6) - It does not treat them equally

When I do it at indexing and query time, it seems to work - but the highlight 
snippets stop working.

I believe it is working, MINUS the highlighting/snippets if that makes sense?

Thanks

Sarfaraz Jamal (Sas)
Revenue Assurance Tech Ops
614-560-8556
sarfaraz.ja...@verizonwireless.com

-Original Message-
From: Alessandro Benedetti [mailto:abenede...@apache.org]
Sent: Thursday, June 2, 2016 5:41 PM
To: solr-user@lucene.apache.org
Subject: [E] Re: Question(s) about Highlighting

Hi Jamal,
I assume you are using the Synonym token filter.
From the observation I can assume you are using it only at indexing time.
This means that when you index you are  :

1) given a row in the synonym.txt you index all the terms per row in place of 
any of the term in the row .

2) given any of the term in the left side of the expression, you index the term 
in the right side of the expression

You can verify this easily with the analysis tool in the Solr UI .



On Thu, Jun 2, 2016 at 7:50 PM, Jamal, Sarfaraz < 
sarfaraz.ja...@verizonwireless.com.invalid> wrote:

> I am having some difficulty understanding how to do something and if 
> it is even possible
>
> I have tried the following sets of Synonyms:
>
> 1.  sarfaraz, sas, sasjamal
> 2.  sasjamal,sas => Sarfaraz
>
> In the second instance, any searches with the world 'sasjamal' do not 
> appear in the results, as it has been converted to Sarfaraz (I
> believe) -
>

This means you don't use the same synonym.txt at query time. indeed sasjamal is 
not in the index at all.


> In the first instance it works better - I believe all instances of any 
> of those words  appear in the results. However the highlighted 
> snippets also stop working when any of those words are Matched. Is 
> there any documentation, insights or help about this issue?
>

I should verify that, it could be related the term offset.
Please take a look to the analysis tool as well to understand better how the 
offsets are assigned.
I remember long time ago there was a discussion about it and a bug or similar 
raised.

Cheers

>
> Thanks in advance,
>
> Sas
>
>
> -Original Message-
> From: Shawn Heisey [mailto:apa...@elyograg.org]
> Sent: Thursday, June 2, 2016 2:43 PM
> To: solr-user@lucene.apache.org
> Subject: [E] Re: MongoDB and Solr - Massive re-indexing
>
> On 6/2/2016 11:56 AM, Robert Brown wrote:
> > My question is whether sending batches of 1,000 documents to Solr is 
> > still beneficial (thinking about docs that may not change), or if I 
> > should look at the MongoDB connector for Solr, based on the volume 
> > of incoming data we see.
> >
> > Would the connector still see all docs updating if I re-insert them 
> > blindly, and thus still send all 50m documents back to Solr everyday 
> > anyway?
> >
> > Is my setup quite typical for the MongoDB connector?
>
> Sending update requests to Solr containing batches of 1000 docs is a 
> good idea.  Depending on how large they are, you may be able to send 
> even more than 1000.  If you can avoid sending documents that haven't 
> changed, Solr will likely perform better and relevance scoring will be 
> better, because you won't have as many deleted docs.
>
> The mongo connector is not software from the Solr project, or even 
> from Apache.  We don't know anything about it.  If you have questions 
> about that software, please contact the people who maintain it.  If 
> their answers lead to questions about Solr itself, then you can bring those 
> back here.
>
> Thanks,
> Shawn
>
>


--
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


RE: [E] Re: Question about Data Import Handler

2016-06-13 Thread Jamal, Sarfaraz
I am sorry I might have missed any replies on this. (I was looking out for 
them) -

Is what I am trying to do even possible?

Thanks,

Sas

-Original Message-
From: Jamal, Sarfaraz [mailto:sarfaraz.ja...@verizonwireless.com.INVALID] 
Sent: Thursday, June 9, 2016 12:43 PM
To: solr-user 
Subject: RE: [E] Re: Question about Data Import Handler

I am on SOLR6 =)

Thanks,

Sas

-Original Message-
From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com]
Sent: Thursday, June 9, 2016 12:42 PM
To: solr-user 
Subject: [E] Re: Question about Data Import Handler

which version of Solr do you run?

On Thu, Jun 9, 2016 at 6:23 PM, Jamal, Sarfaraz < 
sarfaraz.ja...@verizonwireless.com.invalid> wrote:

> Hi Guys,
>
> I have a question about the data import handler and its configuration 
> file
>
> This is what a part of my data-config looks like:
>
>
> 
> 
>
> 
> 
>   
> ===
>
> I would like it so that when its indexed, it returns in xml the 
> following when on that doc.
>
> -
> This Is my name
> This is my description 
>
> The best I have gotten it to do so far is to add to the values in name 
> and description, which are fields on the doc.
>
> Thanks for any help -
>
> P.S. I shall be replying to the other threads as well, I Just took a 
> break from it to come work on another part of SOLR.
>
> Sas
>



--
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>



RE: [E] Re: Question about Data Import Handler

2016-06-09 Thread Jamal, Sarfaraz
I am on SOLR6 =)

Thanks,

Sas

-Original Message-
From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com] 
Sent: Thursday, June 9, 2016 12:42 PM
To: solr-user 
Subject: [E] Re: Question about Data Import Handler

which version of Solr do you run?

On Thu, Jun 9, 2016 at 6:23 PM, Jamal, Sarfaraz < 
sarfaraz.ja...@verizonwireless.com.invalid> wrote:

> Hi Guys,
>
> I have a question about the data import handler and its configuration 
> file
>
> This is what a part of my data-config looks like:
>
>
> 
> 
>
> 
> 
>   
> ===
>
> I would like it so that when its indexed, it returns in xml the 
> following when on that doc.
>
> -
> This Is my name
> This is my description 
>
> The best I have gotten it to do so far is to add to the values in name 
> and description, which are fields on the doc.
>
> Thanks for any help -
>
> P.S. I shall be replying to the other threads as well, I Just took a 
> break from it to come work on another part of SOLR.
>
> Sas
>



--
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>



Question about Data Import Handler

2016-06-09 Thread Jamal, Sarfaraz
Hi Guys,

I have a question about the data import handler and its configuration file

This is what a part of my data-config looks like:









===

I would like it so that when its indexed, it returns in xml the following when 
on that doc.

-
This Is my name
This is my description


The best I have gotten it to do so far is to add to the values in name and 
description, which are fields on the doc.

Thanks for any help -

P.S. I shall be replying to the other threads as well, I Just took a break from 
it to come work on another part of SOLR.

Sas


Stemming Help

2016-06-03 Thread Jamal, Sarfaraz
Hi Guys,

I am following this tutorial:
http://thinknook.com/keyword-stemming-and-lemmatisation-with-apache-solr-2013-08-02/

My (Managed) Schema file looks like this: (in the appropriate places)


-  

-   




  

 -  

-

I have re-indexed everything -

It is not effecting my search at all -

- from what I can tell from the analysis tool nothing is happening.

Is there something else I am missing or should take a look at, or is it 
possible to debug this? Or some other documentation I can search though?

Thanks!

Sas

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Friday, June 3, 2016 2:02 PM
To: solr-user@lucene.apache.org
Subject: Re: [E] Re: Stemming and Managed Schema

On 6/3/2016 9:22 AM, Jamal, Sarfaraz wrote:
> I would edit the managed-schema, make my changes, shutdown solr? And 
> start it back up and verify it is still there?

That's the sledgehammer approach.  Simple and effective, but Solr does go 
offline for a short time.

> Or is there another way to reload the core/collection?

For SolrCloud:
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api2

For non-cloud mode:
https://cwiki.apache.org/confluence/display/solr/CoreAdmin+API#CoreAdminAPI-RELOAD

Thanks,
Shawn



RE: [E] Re: Stemming and Managed Schema

2016-06-03 Thread Jamal, Sarfaraz
Awesome,

So just to make sure I got it right:

I would edit the managed-schema, make my changes, shutdown solr? And start it 
back up and verify it is still there?

Or is there another way to reload the core/collection?

Thanks!

Sas



-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Friday, June 3, 2016 11:17 AM
To: solr-user@lucene.apache.org
Subject: [E] Re: Stemming and Managed Schema

On 6/3/2016 9:07 AM, Jamal, Sarfaraz wrote:
> I found the following article:
> http://thinknook.com/keyword-stemming-and-lemmatisation-with-apache-so
> lr-2013-08-02/
>
> And I want to do stemming on one of our fields.
>
> However, I am using a Managed Schema and I am unsure how to add these 
> two blocks to it -
>
> I know there is an API for managed schemas, would that support these 
> additions?

You can't edit an existing fieldType with the Schema API.  You can entirely 
replace it, but you have to include the whole definition.

https://cwiki.apache.org/confluence/display/solr/Schema+API#SchemaAPI-ReplaceaFieldType

I'm aware that the managed-schema file says to not make manual edits -- but you 
*can* edit it manually, as long as you are absolutely sure that nobody is using 
the Schema API until after you complete your edits and reload the 
core/collection.

Thanks,
Shawn



Stemming and Managed Schema

2016-06-03 Thread Jamal, Sarfaraz
Hi Guys,

I found the following article:
http://thinknook.com/keyword-stemming-and-lemmatisation-with-apache-solr-2013-08-02/

And I want to do stemming on one of our fields.

However, I am using a Managed Schema and I am unsure how to add these two 
blocks to it -

I know there is an API for managed schemas, would that support these additions?

Thanks!

Sas


RE: [E] Re: Question(s) about Highlighting

2016-06-03 Thread Jamal, Sarfaraz
Good Morning Alessandro,

I verified it through the analysis tool (thanks for pointing it out), and it 
appears to be working correctly - As I see all of them as being synonyms of 
each other for this entry:

sasjamal, sarfaraz, sas

- When I do it only at indexing time, and disable it during query time (editing 
the synonyms.txt file SOLR6) -
It does not treat them equally

When I do it at indexing and query time, it seems to work - but the highlight 
snippets stop working.

I believe it is working, MINUS the highlighting/snippets if that makes sense?

Thanks

Sarfaraz Jamal (Sas)
Revenue Assurance Tech Ops
614-560-8556
sarfaraz.ja...@verizonwireless.com

-Original Message-
From: Alessandro Benedetti [mailto:abenede...@apache.org] 
Sent: Thursday, June 2, 2016 5:41 PM
To: solr-user@lucene.apache.org
Subject: [E] Re: Question(s) about Highlighting

Hi Jamal,
I assume you are using the Synonym token filter.
From the observation I can assume you are using it only at indexing time.
This means that when you index you are  :

1) given a row in the synonym.txt you index all the terms per row in place of 
any of the term in the row .

2) given any of the term in the left side of the expression, you index the term 
in the right side of the expression

You can verify this easily with the analysis tool in the Solr UI .



On Thu, Jun 2, 2016 at 7:50 PM, Jamal, Sarfaraz < 
sarfaraz.ja...@verizonwireless.com.invalid> wrote:

> I am having some difficulty understanding how to do something and if 
> it is even possible
>
> I have tried the following sets of Synonyms:
>
> 1.  sarfaraz, sas, sasjamal
> 2.  sasjamal,sas => Sarfaraz
>
> In the second instance, any searches with the world 'sasjamal' do not 
> appear in the results, as it has been converted to Sarfaraz (I 
> believe) -
>

This means you don't use the same synonym.txt at query time. indeed sasjamal is 
not in the index at all.


> In the first instance it works better - I believe all instances of any 
> of those words  appear in the results. However the highlighted 
> snippets also stop working when any of those words are Matched. Is 
> there any documentation, insights or help about this issue?
>

I should verify that, it could be related the term offset.
Please take a look to the analysis tool as well to understand better how the 
offsets are assigned.
I remember long time ago there was a discussion about it and a bug or similar 
raised.

Cheers

>
> Thanks in advance,
>
> Sas
>
>
> -Original Message-
> From: Shawn Heisey [mailto:apa...@elyograg.org]
> Sent: Thursday, June 2, 2016 2:43 PM
> To: solr-user@lucene.apache.org
> Subject: [E] Re: MongoDB and Solr - Massive re-indexing
>
> On 6/2/2016 11:56 AM, Robert Brown wrote:
> > My question is whether sending batches of 1,000 documents to Solr is 
> > still beneficial (thinking about docs that may not change), or if I 
> > should look at the MongoDB connector for Solr, based on the volume 
> > of incoming data we see.
> >
> > Would the connector still see all docs updating if I re-insert them 
> > blindly, and thus still send all 50m documents back to Solr everyday 
> > anyway?
> >
> > Is my setup quite typical for the MongoDB connector?
>
> Sending update requests to Solr containing batches of 1000 docs is a 
> good idea.  Depending on how large they are, you may be able to send 
> even more than 1000.  If you can avoid sending documents that haven't 
> changed, Solr will likely perform better and relevance scoring will be 
> better, because you won't have as many deleted docs.
>
> The mongo connector is not software from the Solr project, or even 
> from Apache.  We don't know anything about it.  If you have questions 
> about that software, please contact the people who maintain it.  If 
> their answers lead to questions about Solr itself, then you can bring those 
> back here.
>
> Thanks,
> Shawn
>
>


--
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Question(s) about Highlighting

2016-06-02 Thread Jamal, Sarfaraz
I am having some difficulty understanding how to do something and if it is even 
possible

I have tried the following sets of Synonyms:

1.  sarfaraz, sas, sasjamal
2.  sasjamal,sas => Sarfaraz

In the second instance, any searches with the world 'sasjamal' do not appear in 
the results, as it has been converted to Sarfaraz (I believe) -
In the first instance it works better - I believe all instances of any of those 
words  appear in the results. However the highlighted snippets also stop 
working when any of those words are 
Matched. Is there any documentation, insights or help about this issue?

Thanks in advance,

Sas


-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Thursday, June 2, 2016 2:43 PM
To: solr-user@lucene.apache.org
Subject: [E] Re: MongoDB and Solr - Massive re-indexing

On 6/2/2016 11:56 AM, Robert Brown wrote:
> My question is whether sending batches of 1,000 documents to Solr is 
> still beneficial (thinking about docs that may not change), or if I 
> should look at the MongoDB connector for Solr, based on the volume of 
> incoming data we see.
>
> Would the connector still see all docs updating if I re-insert them 
> blindly, and thus still send all 50m documents back to Solr everyday 
> anyway?
>
> Is my setup quite typical for the MongoDB connector?

Sending update requests to Solr containing batches of 1000 docs is a good idea. 
 Depending on how large they are, you may be able to send even more than 1000.  
If you can avoid sending documents that haven't changed, Solr will likely 
perform better and relevance scoring will be better, because you won't have as 
many deleted docs.

The mongo connector is not software from the Solr project, or even from Apache. 
 We don't know anything about it.  If you have questions about that software, 
please contact the people who maintain it.  If their answers lead to questions 
about Solr itself, then you can bring those back here.

Thanks,
Shawn



RE: [E] Re: Faceting Question(s)

2016-06-02 Thread Jamal, Sarfaraz
Thank you Andrew, that looks like exactly what I am looking for =)
Thank you Robert, it looks like we are both doing it in similar fashion =)
Thank you MaryJo  for jumping right in!

Sas



-Original Message-
From: Andrew Chillrud [mailto:achill...@opentext.com] 
Sent: Thursday, June 2, 2016 2:17 PM
To: solr-user@lucene.apache.org
Subject: RE: [E] Re: Faceting Question(s)

It is possible to get the original facet counts for the field you are filtering 
on (we have been using this since Solr 3.6). Don't know if this can be extended 
to get the original counts for all fields however. 

This syntax is described here: 
https://cwiki.apache.org/confluence/display/solr/Faceting

Tagging and Excluding Filters

You can tag specific filters and exclude those filters when faceting. This is 
useful when doing multi-select faceting.

Consider the following example query with faceting:

q=mainquery&fq=status:public&fq=doctype:pdf&facet=true&facet.field=doctype

Because everything is already constrained by the filter doctype:pdf, the 
facet.field=doctype facet command is currently redundant and will return 0 
counts for everything except doctype:pdf.

To implement a multi-select facet for doctype, a GUI may want to still display 
the other doctype values and their associated counts, as if the doctype:pdf 
constraint had not yet been applied. For example:
=== Document Type ===
  [ ] Word (42)
  [x] PDF  (96)
  [ ] Excel(11)
  [ ] HTML (63)

To return counts for doctype values that are currently not selected, tag 
filters that directly constrain doctype, and exclude those filters when 
faceting on doctype.

q=mainquery&fq=status:public&fq={!tag=dt}doctype:pdf&facet=true&facet.field={!ex=dt}doctype

Filter exclusion is supported for all types of facets. Both the tag and ex 
local parameters may specify multiple values by separating them with commas.

- Andy -

-Original Message-
From: Robert Brown [mailto:r...@intelcompute.com]
Sent: Thursday, June 02, 2016 2:12 PM
To: solr-user@lucene.apache.org
Subject: Re: [E] Re: Faceting Question(s)

MaryJo, I think you've mis-understood.  The counts are different simply because 
the 2nd query contains an filter of a facet value from the 1st query - that's 
completely expected.

The issue is how to get the original facet counts (with no filters but same q) 
in the same call as also filtering by one of those facet values.

Personally I don't think it's possible, but will be interested to hear others 
input, since it's a very common situation for me - I cache the first result in 
memcached and tag future queries as related to the first.

Or could you always make 2 calls back to Solr (one original (again), and one 
with the filters), the caches should help massively.



On 02/06/16 19:07, MaryJo Sminkey wrote:
> And you're saying the count for the second query is different than 
> what was returned in the facet? You may need to check for any defaults 
> you have set up in the solrconfig for the select parser, if for 
> instance you have any grouping going on, but aren't doing grouping in 
> your facet, that could result in the counts being off.
>
> MJ
>
>
>
>
> On Thu, Jun 2, 2016 at 2:01 PM, Jamal, Sarfaraz < 
> sarfaraz.ja...@verizonwireless.com.invalid> wrote:
>
>> Absolutely,
>>
>> Here is what it looks like:
>>
>> This brings the right counts as it should http:// 
>> **select?q=video&hl=true&hl.fl=*&hl.snippets=20&facet=true&fa
>> cet.field=team
>>
>> Then when I specify which team
>> http://
>> **select?q=video&hl=true&hl.fl=*&hl.snippets=20&facet=true&fa
>> cet.field=team&fq=team:rollback
>>
>> The counts are obviously different now, as the result set is limited 
>> to one team.
>>
>> Sas
>>
>> -Original Message-
>> From: MaryJo Sminkey [mailto:mjsmin...@gmail.com]
>> Sent: Thursday, June 2, 2016 1:56 PM
>> To: solr-user@lucene.apache.org
>> Subject: [E] Re: Faceting Question(s)
>>
>> Jamai - what is your q= set to? And do you have a fq for the original 
>> query? I have found that if you do a wildcard search (*.*) you have 
>> to be careful about other parameters you set as that can often result 
>> in the numbers returned being off. In my case, my defaults had things 
>> like edismax settings for phrase boosting, etc. that don't apply if 
>> there isn't a search term, and once I removed those for a wildcard 
>> search I got the correct numbers. So possibly your facet query itself 
>> may be set up correctly but something else in the parameters and/or 
>> filters with the two queries may be the cause of the difference.
>>
>> Mary Jo
>>
>>
>> 

RE: [E] Re: Faceting Question(s)

2016-06-02 Thread Jamal, Sarfaraz
Absolutely,

Here is what it looks like:

This brings the right counts as it should
http://**select?q=video&hl=true&hl.fl=*&hl.snippets=20&facet=true&facet.field=team

Then when I specify which team
http://**select?q=video&hl=true&hl.fl=*&hl.snippets=20&facet=true&facet.field=team&fq=team:rollback

The counts are obviously different now, as the result set is limited to one 
team.

Sas

-Original Message-
From: MaryJo Sminkey [mailto:mjsmin...@gmail.com] 
Sent: Thursday, June 2, 2016 1:56 PM
To: solr-user@lucene.apache.org
Subject: [E] Re: Faceting Question(s)

Jamai - what is your q= set to? And do you have a fq for the original query? I 
have found that if you do a wildcard search (*.*) you have to be careful about 
other parameters you set as that can often result in the numbers returned being 
off. In my case, my defaults had things like edismax settings for phrase 
boosting, etc. that don't apply if there isn't a search term, and once I 
removed those for a wildcard search I got the correct numbers. So possibly your 
facet query itself may be set up correctly but something else in the parameters 
and/or filters with the two queries may be the cause of the difference.

Mary Jo


On Thu, Jun 2, 2016 at 1:47 PM, Jamal, Sarfaraz < 
sarfaraz.ja...@verizonwireless.com.invalid> wrote:

> Hello Everyone,
>
> I am working on implementing some basic faceting into my project.
>
> I have it working the way I want to, but I feel like there is probably 
> a better way the way I went about it.
>
> * I want to show a category and its count.
> * when someone clicks a category, it sets a FQ= to that category.
>
> But now that the results are being filtered, the category counts from 
> the original query without the filters are off.
>
> So, I have a single api call that I make with rows set to 0 and the 
> base query without any filters, and use that to display my categories.
>
> And then I call the api again, this time to get the results. And the 
> category count is the same.
>
> I hope that makes sense.
>
> I was hoping  facet.query would be of help, but I am not sure I 
> understood it properly.
>
> Thanks in advance =)
>
> Sas
>


Faceting Question(s)

2016-06-02 Thread Jamal, Sarfaraz
Hello Everyone,

I am working on implementing some basic faceting into my project.

I have it working the way I want to, but I feel like there is probably a better 
way the way I went about it.

* I want to show a category and its count.
* when someone clicks a category, it sets a FQ= to that category.

But now that the results are being filtered, the category counts from the 
original query without the filters are off.

So, I have a single api call that I make with rows set to 0 and the base query 
without any filters, and use that to display my categories.

And then I call the api again, this time to get the results. And the category 
count is the same.

I hope that makes sense.

I was hoping  facet.query would be of help, but I am not sure I understood it 
properly.

Thanks in advance =)

Sas


RE: [E] Re: Simple Question about SimplePostTool

2016-06-01 Thread Jamal, Sarfaraz
Thank you.

Sas

-Original Message-
From: Erik Hatcher [mailto:erik.hatc...@gmail.com] 
Sent: Wednesday, June 1, 2016 4:34 PM
To: solr-user@lucene.apache.org
Subject: [E] Re: Simple Question about SimplePostTool

Yes, you can add “literal” field values with bin/post:

   bin/post -c test ~/Documents/Test.pdf  -params "literal.foo=bar”

See 
https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika#UploadingDatawithSolrCellusingApacheTika-InputParameters
 for details on what parameters you can use with “rich document” indexing.

—
Erik Hatcher, Senior Solutions Architect http://www.lucidworks.com



> On Jun 1, 2016, at 3:28 PM, Jamal, Sarfaraz 
>  wrote:
> 
> Hi Guys,
> 
> I am a newbie at Solr, so I may have some very simple questions.
> I am also waiting for my book to arrive.
> 
> Can the SimplePostTool be used to add additional fields when indexing a 
> word/excel/text.
> 
> So, for example, as I index a word document, I pass in a parameter 
> saying team=avengers
> 
> Or something along the lines of that -
> 
> Thank you,
> 
> Sas



Simple Question about SimplePostTool

2016-06-01 Thread Jamal, Sarfaraz
Hi Guys,

I am a newbie at Solr, so I may have some very simple questions.
I am also waiting for my book to arrive.

Can the SimplePostTool be used to add additional fields when indexing a 
word/excel/text.

So, for example, as I index a word document, I pass in a parameter saying 
team=avengers

Or something along the lines of that -

Thank you,

Sas