Analysers for newspaper pages...

Dawn Zoë Raison Mon, 28 Nov 2011 11:11:20 -0800

Hi folks,

I'm researching the best options to use for analysing/storing newspaperpages in out online archive, and wondered if anyone has any good hintsor tips on good practice for this type of media?

I'm currently thinking alone the lines of using a customisedStandardAnalyser (no stop words + extra date token detection) wrappedwith a Shingle filter and finally a Stopword filter - the thinking beingthis should reduce the impact of stop words but still allow "to be ornot to be" searches...


A future aim is to add a synonym filter at search time.

We currently have ~2.5million pages - some of the older broadsheet pagescan have a serious number of tokens.We currently index using the SimpleAnalyser - a hangover from theprevious developers I hope to remedy :-).


--

Rgds.
*Dawn Raison*
Technical Director, Digitorial Ltd.

Analysers for newspaper pages...

Reply via email to