Re: [dspace-tech] Re: Statistics ip lists obsolete and update script not working

2023-01-23 Thread Mark H. Wood
On Mon, Jan 23, 2023 at 04:17:07PM +0100, Claudia Jürgen wrote:
> I did not have time to look into it. Most of the ip list are not free
> anymore, so I wonder how we can clean up the statistics, like replacing
> them with a new source of lists and then flag the bots and remove them.

There is a PR for a new source:  https://github.com/DSpace/DSpace/pull/2892

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu

-- 
All messages to this mailing list should adhere to the Code of Conduct: 
https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dspace-tech/Y86teHjn6qpdcCZ6%40IUPUI.Edu.


signature.asc
Description: PGP signature


Re: [dspace-tech] Re: Statistics ip lists obsolete and update script not working

2023-01-23 Thread Claudia Jürgen

Hi Karol and all,

I did not have time to look into it. Most of the ip list are not free
anymore, so I wonder how we can clean up the statistics, like replacing
them with a new source of lists and then flag the bots and remove them.

Sunny greetings

Claudia




Am 21.01.2023 um 17:28 schrieb Karol:

Hi Claudia,

i have exactly the same problem.UP.

Best,

Karol

wtorek, 17 stycznia 2023 o 15:19:39 UTC+1 Claudia Jürgen napisał(a):

Hi all,

I noted two things about the iplists used for stats-util.

The lists are configured in:


https://github.com/DSpace/DSpace/blob/main/dspace/config/modules/solr-statistics.cfg 


solr-statistics.spiderips.urls = http://iplists.com/google.txt
, \
http://iplists.com/inktomi.txt , \
http://iplists.com/lycos.txt , \
http://iplists.com/infoseek.txt , \
http://iplists.com/altavista.txt , \
http://iplists.com/excite.txt , \
http://iplists.com/misc.txt 


a) the lists are most likely obsolete and thus the statistics very
imprecise with regards to bot traffic
https://iplists.com/ 
The last revised dates on the site are from 2008 and 2014
Maybe we need another source for iplists and a "cleanup".

b) stats-util -u (in order to get theoretically updated files) does not
work and throws an NPE
Getting: http://iplists.com/google.txt 
To: /opt/dspace/dspace63tu/config/spiders/iplists.com-google.txt
- Error: null
java.lang.NullPointerException
at org.apache.tools.ant.taskdefs.Get.doGet(Get.java:221)
at org.apache.tools.ant.taskdefs.Get.execute(Get.java:134)
at

org.dspace.statistics.util.StatisticsClient.updateSpiderFiles(StatisticsClient.java:152)
at
org.dspace.statistics.util.StatisticsClient.main(StatisticsClient.java:80)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at

sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at

org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:229)
at
org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:81)

Sunny Greetings

Claudia

--
Claudia Juergen

Technische Universität Dortmund
Universitätsbibliothek
Bibliotheks-IT
Vogelpothsweg 76
44227 Dortmund

Tel.: +49 231-755 40 43 
Fax: +49 231-755 40 32 
claudia...@tu-dortmund.de
www.ub.tu-dortmund.de 


Wichtiger Hinweis: Die Information in dieser E-Mail ist vertraulich.
Sie ist ausschließlich für den Adressaten bestimmt. Sollten Sie
nicht der für diese E-Mail bestimmte Adressat sein, unterrichten Sie
bitte den Absender und vernichten Sie diese Mail. Vielen Dank.
Unbeschadet der Korrespondenz per E-Mail, sind unsere Erklärungen
ausschließlich final rechtsverbindlich, wenn sie in herkömmlicher
Schriftform (mit eigenhändiger Unterschrift) oder durch Übermittlung
eines solchen Schriftstücks per Telefax erfolgen.

Important note: The information included in this e-mail is
confidential. It is solely intended for the recipient. If you are
not the intended recipient of this e-mail please contact the sender
and delete this message. Thank you. Without prejudice of e-mail
correspondence, our statements are only legally binding when they
are made in the conventional written form (with personal signature)
or when such documents are sent by fax.

--
All messages to this mailing list should adhere to the Code of Conduct:
https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx

---
You received this message because you are subscribed to the Google
Groups "DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to dspace-tech+unsubscr...@googlegroups.com
.
To view this discussion on the web visit
https://groups.google.com/d/msgid/dspace-tech/7492efa0-a1ab-49ad-a821-1cf5bd652846n%40googlegroups.com
 
.


--
Claudia Juergen

Technische Universität Dortmund
Universitätsbibliothek
Bibliotheks-IT
Vogelpothsweg 76
44227 Dortmund

Tel.: +49 231-755 40 43
Fax: +49 231-755 40 32
claudia.juer...@tu-dortmund.de
www.ub.tu-dortmund.de

Wichtiger Hinweis: Die Information in 

Re: [dspace-tech] Re: Statistics ip lists obsolete and update script not working

2023-01-23 Thread Mark H. Wood
This seems to have been fixed in https://github.com/DSpace/DSpace/issues/8528

The code relies on Ant's 'get' task to do the downloading.  It appears
that we have been playing fast and loose with Ant's infrastructure,
and a shortcut that used to work now fails.

This single use drags all of Ant into the DSpace runtime.  Maybe we
should be using Commons HttpComponents, which is found in a number of
places in DSpace, instead?

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu

-- 
All messages to this mailing list should adhere to the Code of Conduct: 
https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dspace-tech/Y86fSNKsLabNDW8s%40IUPUI.Edu.


signature.asc
Description: PGP signature


[dspace-tech] Re: Statistics ip lists obsolete and update script not working

2023-01-21 Thread Karol
Hi Claudia,

i have exactly the same problem.UP.

Best,

Karol

wtorek, 17 stycznia 2023 o 15:19:39 UTC+1 Claudia Jürgen napisał(a):

> Hi all,
>
> I noted two things about the iplists used for stats-util.
>
> The lists are configured in:
>
>
> https://github.com/DSpace/DSpace/blob/main/dspace/config/modules/solr-statistics.cfg
>
> solr-statistics.spiderips.urls = http://iplists.com/google.txt, \
> http://iplists.com/inktomi.txt, \
> http://iplists.com/lycos.txt, \
> http://iplists.com/infoseek.txt, \
> http://iplists.com/altavista.txt, \
> http://iplists.com/excite.txt, \
> http://iplists.com/misc.txt
>
>
> a) the lists are most likely obsolete and thus the statistics very
> imprecise with regards to bot traffic
> https://iplists.com/
> The last revised dates on the site are from 2008 and 2014
> Maybe we need another source for iplists and a "cleanup".
>
> b) stats-util -u (in order to get theoretically updated files) does not
> work and throws an NPE
> Getting: http://iplists.com/google.txt
> To: /opt/dspace/dspace63tu/config/spiders/iplists.com-google.txt
> - Error: null
> java.lang.NullPointerException
> at org.apache.tools.ant.taskdefs.Get.doGet(Get.java:221)
> at org.apache.tools.ant.taskdefs.Get.execute(Get.java:134)
> at
>
> org.dspace.statistics.util.StatisticsClient.updateSpiderFiles(StatisticsClient.java:152)
> at
> org.dspace.statistics.util.StatisticsClient.main(StatisticsClient.java:80)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
>
> org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:229)
> at
> org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:81)
>
> Sunny Greetings
>
> Claudia
>
> --
> Claudia Juergen
>
> Technische Universität Dortmund
> Universitätsbibliothek
> Bibliotheks-IT
> Vogelpothsweg 76
> 44227 Dortmund
>
> Tel.: +49 231-755 40 43 <+49%20231%207554043>
> Fax: +49 231-755 40 32 <+49%20231%207554032>
> claudia...@tu-dortmund.de
> www.ub.tu-dortmund.de
>
>
> Wichtiger Hinweis: Die Information in dieser E-Mail ist vertraulich. Sie 
> ist ausschließlich für den Adressaten bestimmt. Sollten Sie nicht der für 
> diese E-Mail bestimmte Adressat sein, unterrichten Sie bitte den Absender 
> und vernichten Sie diese Mail. Vielen Dank.
> Unbeschadet der Korrespondenz per E-Mail, sind unsere Erklärungen 
> ausschließlich final rechtsverbindlich, wenn sie in herkömmlicher 
> Schriftform (mit eigenhändiger Unterschrift) oder durch Übermittlung eines 
> solchen Schriftstücks per Telefax erfolgen.
>
> Important note: The information included in this e-mail is confidential. 
> It is solely intended for the recipient. If you are not the intended 
> recipient of this e-mail please contact the sender and delete this message. 
> Thank you. Without prejudice of e-mail correspondence, our statements are 
> only legally binding when they are made in the conventional written form 
> (with personal signature) or when such documents are sent by fax.
>

-- 
All messages to this mailing list should adhere to the Code of Conduct: 
https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dspace-tech/7492efa0-a1ab-49ad-a821-1cf5bd652846n%40googlegroups.com.