[dspace-tech] Google analytics and DSpace

2017-04-10 Thread lukedevntl
We recently upgrade from DSpace 4 to 6 and we were pleased to get extra 
reporting such as download events and seeing in real time what is being 
downloaded.

Our statistics now report 3 months of session day per day and I'm 
struggling to understand how this can be.

For instance in the past 6 days apparently we have had 100,000 sessions but 
only 10,000 page views and 150,000 download events. Since the update direct 
traffic is the most common channel (which seems highly unlikely) and over 
99% (or 80,000 sessions) of direct traffic has "(not set)" as the landing 
page - I assume this is DSpace rather than real people?

The only interpretation I can make is sessions are broken; there are 10,000 
page visits and 150,000 files downloaded (probably from Google or another 
referrer without going to pages).

But I can't really assume something is broken and pick and choose stats to 
take seriously. Does anyone have any ideas about what is going on?

Thanks.

-- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To post to this group, send email to dspace-tech@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.


[dspace-tech] Re: BookReader integration in DSpace + fulltext searching inside the document

2017-04-10 Thread Pedro Amorim
Hello all,

After a bit of prototyping I decided to go with SOLR dynamic fields and 
querying SOLR with fl (field list) returning only the field (or word) 
requested.

However, I first need to override the ImageMagickPdfThumbnailFilter 
media-filter in order to not only create a small thumbnail, but also create 
a large thumbnail for every page in the respective PDF file, and store 
those new JPEG files in a new custom bundle.

After taking a look at the mediafilter/MediaFilterServiceImpl.java and 
realizing it's not very flexible (it's not easy to create multiple files 
out of just 1), maybe the best route here would be to override the 
postProcess method and have the method create them after the first small 
thumbnail is completed.

If anyone has implemented postProcessing on any media filter before please 
do advise, as I didn't find much snooping around.

Again, thanks.

Pedro Amorim

quinta-feira, 6 de Abril de 2017 às 17:23:34 UTC, Pedro Amorim escreveu:
>
> Hello everyone,
>
> I'm currently trying to plan an implementation of this and wanted to ask 
> the opinion of the developers on how to go about it.
>
> I have seen the great resources provided by Peter Dietz@LongSight 
> regarding the integration of BookReader 
>  in DSpace such as the video 
> demo  and the source code 
> 
> .
> And all works great, provided that the bitstreams contained in the item 
> follow a specific nomenclature (001.jpg, 002.jpg, 003.jpg, etc) so that the 
> client app can request/render them in the correct order and request page 
> ranges, etc.
>
> However, the feature of searching within the document itself is disabled, 
> because - I believe - this particular feature needs a backend to supply the 
> client app with the needed information.
> This can be seen in production in archive.org or with a specific example 
> of searching the term *Socrates* within a book 
> 
> .
>
> The backend from internet archives' BookReader returns a JSON entry for 
> every hit, example:
>
> {
> "text": "fly towards him, nestle in his breast, and then spread its 
> wings and soai upwards, singing most sweetly The next morning Ariston 
> appeared, leading his son Plato to the philosopher, and {{{Socrates}}} knew 
> that his dieam was fulfilled", 
> "par": [
> {
> "boxes": [
> {
> "r": 694, 
> "b": 412, 
> "t": 358, 
> "page": 10, 
> "l": 531
> }
> ], 
> "b": 463, 
> "t": 172, 
> "page_width": 1243, 
> "r": 1146, 
> "l": 28, 
> "page_height": 2123, 
> "page": 10
> }
> ]
> }
>
> This makes sense because with this info the client app can *1)* correctly 
> pinpoint the specific pages where the term is found and *2)* correctly 
> render the highlight box around the searched term within the page being 
> presented using the 'coordinates' and dimensions.
>
> *Assuming:*
> 1) Have all the required bitstreams in jpeg format and in the correct 
> naming convention mentioned above;
> 2) Have the required word location information in ALTO.xml files (DSpace 
> wouldn't generate that info, need only to process/serve it).
>
> *How would one have DSpace act as a backend for the BookReader client app?*
>
> The best theorycrafting I've come up with thus far is to build a custom 
> media-filter that would interpret the word information contained in the 
> ALTO.xml files for each item, and store this information in a new custom 
> SOLR index, that would afterwards be queried by the client app. Every item 
> would have their own word index with information for each word (page, 
> width, height, vpos, hpos), this means this particular index would have to 
> be repeated for every word and serve only the *hits* to the client app.
>
> For example, the following query:
>
> /solr/search/select?q=search.resourceid:=
>
> Would return the information for all the occurrences of the *word* index 
> with the value  (above ex: Socrates). 
> IF this would be accomplished, in theory, it would work.
>
> Has anyone got other idea on this? Or implemented something similar 
> before? Or thought about it before?
>
> Sorry for the wall of text.
>
> Thanks as always,
>
> Pedro Amorim
>
>
>
>
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To post to this group, send email to dspace-tech@googlegroups.com.
Visit 

[dspace-tech] Re: Downloads.

2017-04-10 Thread Mark Wood
On Tuesday, April 4, 2017 at 11:10:58 PM UTC-4, Mark Lamont wrote:
>
> I'm am in the process of setting up an information repository with DSpace 
> and am wondering if there is a way to make files available to users without 
> them being downloadable? Most of what I've submitted needs to be downloaded 
> to be viewed-or listened to as the case may be. We would like to limit some 
> of the material to members of our organization so would like to be able to 
> make some files non-downloadable. Is there a way to do this with this 
> software even though it is open access. 
>


What you asked is not possible for any web service, since there is no way 
for the service to know what the client will do with a file that is sent to 
it.  As others have described, the only thing you can do is to grant READ 
permission to some logged-in users and deny any access by all others.

-- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To post to this group, send email to dspace-tech@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.


[dspace-tech] Re: dspace 5.6 rest api

2017-04-10 Thread Mark Wood
On Monday, April 3, 2017 at 4:12:42 PM UTC-4, Ricardo Campos wrote:
>
> Hi.
>  
>
>> Some further information on the problem.
>>
>
> I configured DSpace to give me some debug information and I found some 
> errors, all related to same thing:
>
> 2017-04-03 16:54:57,749 ERROR org.dspace.statistics.SolrLogger @ Failed 
> DNS Lookup for IP:0:0:0:0:0:0:0:1
>
> I reckon if this error is responsible for the problems and I cannot figure 
> out what cause these messages.
>
>

0:0:0:0:0:0:0:1 (sometimes expressed as "::1") is the IPv6 loopback address 
of the local host. o.d.statistics.SolrLogger is the class which sends 
events to Solr for calculating usage statistics.  The first thing I would 
suppose is that your 'curl' command connected to localhost via IPv6 and 
that SolrLogger is trying to back-resolve that address to a name but 
cannot.  That would suggest incomplete configuration of IPv6 name 
resolution:  the host can map 'localhost' to '::1' (so 'curl' succeeded) 
but cannot map "::1' to 'localhost' (so SolrLogger failed).  This would 
cause DSpace to fail to record object usages requested by the local host 
through IPv6, but I doubt that that is the cause of the null 
parentCommunity.

-- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To post to this group, send email to dspace-tech@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.


Re: [dspace-tech] What is in the assetstore?

2017-04-10 Thread Claudia Jürgen

Hell Evgeni,

do you run cleanup as a cron job
https://wiki.duraspace.org/display/DSDOC5x/Storage+Layer#StorageLayer-Cleanup


Hope this helps

Claudia Jürgen


Am 10.04.2017 um 11:17 schrieb Evgeni Dimitrov:

Thank you Emilio,

For every item, I ingest
- the content (the files appearing in the bundle ORIGINAL)
- mets.xml (appearing in the METADATA bundle)
- a thumbnail (appearing in the THUMBNAIL bundle).
DSpace creates one more - license.txt (in the LICENSE bundle).

All these 3 additional bitstreams are listed in the result obtained with

rest/items//bitstreams?offset=

So that the 3 additional files in every item are counted in my total of
bitstreams - 335380.
But there are 50487 more in the assetstore, which I can not explain.

Best regards
Evgeni

On Monday, April 10, 2017 at 11:49:45 AM UTC+3, Emilio Lorenzo wrote:

Hi, Evgeni

yes, the assestore can contains more files you would not have thought
about, in fact anything belonging to ítems that it is not metadata

Thumbnails
full-text extraction
licenses (txt and creative commons)
etc...

regards
Emilio



This is with DSpace 5.6.

I had to provide for every top community
- number of items
- number of bitstreams

I used REST - for items:

rest/items?offset==parentCommunityList

for bitstreams:

rest/items//bitstreams?offset=

The total for all top communities is
- items 24601
- bitstreams 335380

Just to double-check I got the number of files in the assetstore

find /assetstore -type f | wc -l

It is 385867 - much bigger than the number of bitstreams - 50487 files
more.
Are there in the assetstore other files apart of bitstreams or may be
something in my counting is wrong?

--
You received this message because you are subscribed to the Google

Groups

"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send

an

email to dspace-tech...@googlegroups.com .
To post to this group, send email to dspac...@googlegroups.com

.

Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.






--
Claudia Juergen
Eldorado

Technische Universität Dortmund
Universitätsbibliothek
Vogelpothsweg 76
44227 Dortmund

Tel.: +49 231-755 40 43
Fax: +49 231-755 40 32
claudia.juer...@tu-dortmund.de
www.ub.tu-dortmund.de

Wichtiger Hinweis: Die Information in dieser E-Mail ist vertraulich. Sie ist 
ausschließlich für den Adressaten bestimmt. Sollten Sie nicht der für diese 
E-Mail bestimmte Adressat sein, unterrichten Sie bitte den Absender und 
vernichten Sie diese Mail. Vielen Dank.
Unbeschadet der Korrespondenz per E-Mail, sind unsere Erklärungen 
ausschließlich final rechtsverbindlich, wenn sie in herkömmlicher Schriftform 
(mit eigenhändiger Unterschrift) oder durch Übermittlung eines solchen 
Schriftstücks per Telefax erfolgen.

Important note: The information included in this e-mail is confidential. It is 
solely intended for the recipient. If you are not the intended recipient of 
this e-mail please contact the sender and delete this message. Thank you. 
Without prejudice of e-mail correspondence, our statements are only legally 
binding when they are made in the conventional written form (with personal 
signature) or when such documents are sent by fax.

--
You received this message because you are subscribed to the Google Groups "DSpace 
Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To post to this group, send email to dspace-tech@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.


Re: [dspace-tech] What is in the assetstore?

2017-04-10 Thread Evgeni Dimitrov
Thank you Emilio,

For every item, I ingest
- the content (the files appearing in the bundle ORIGINAL) 
- mets.xml (appearing in the METADATA bundle)
- a thumbnail (appearing in the THUMBNAIL bundle).
DSpace creates one more - license.txt (in the LICENSE bundle).

All these 3 additional bitstreams are listed in the result obtained with

rest/items//bitstreams?offset=

So that the 3 additional files in every item are counted in my total of 
bitstreams - 335380.
But there are 50487 more in the assetstore, which I can not explain.

Best regards
Evgeni

On Monday, April 10, 2017 at 11:49:45 AM UTC+3, Emilio Lorenzo wrote:
>
> Hi, Evgeni 
>
> yes, the assestore can contains more files you would not have thought 
> about, in fact anything belonging to ítems that it is not metadata 
>
> Thumbnails 
> full-text extraction 
> licenses (txt and creative commons) 
> etc... 
>
> regards 
> Emilio 
>
>
> > This is with DSpace 5.6. 
> > 
> > I had to provide for every top community 
> > - number of items 
> > - number of bitstreams 
> > 
> > I used REST - for items: 
> > 
> > rest/items?offset==parentCommunityList 
> > 
> > for bitstreams: 
> > 
> > rest/items//bitstreams?offset= 
> > 
> > The total for all top communities is 
> > - items 24601 
> > - bitstreams 335380 
> > 
> > Just to double-check I got the number of files in the assetstore 
> > 
> > find /assetstore -type f | wc -l 
> > 
> > It is 385867 - much bigger than the number of bitstreams - 50487 files 
> > more. 
> > Are there in the assetstore other files apart of bitstreams or may be 
> > something in my counting is wrong? 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups 
> > "DSpace Technical Support" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an 
> > email to dspace-tech...@googlegroups.com . 
> > To post to this group, send email to dspac...@googlegroups.com 
> . 
> > Visit this group at https://groups.google.com/group/dspace-tech. 
> > For more options, visit https://groups.google.com/d/optout. 
> > 
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To post to this group, send email to dspace-tech@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.


Re: [dspace-tech] What is in the assetstore?

2017-04-10 Thread elorenzo
Hi, Evgeni

yes, the assestore can contains more files you would not have thought
about, in fact anything belonging to ítems that it is not metadata

Thumbnails
full-text extraction
licenses (txt and creative commons)
etc...

regards
Emilio


> This is with DSpace 5.6.
>
> I had to provide for every top community
> - number of items
> - number of bitstreams
>
> I used REST - for items:
>
> rest/items?offset==parentCommunityList
>
> for bitstreams:
>
> rest/items//bitstreams?offset=
>
> The total for all top communities is
> - items 24601
> - bitstreams 335380
>
> Just to double-check I got the number of files in the assetstore
>
> find /assetstore -type f | wc -l
>
> It is 385867 - much bigger than the number of bitstreams - 50487 files
> more.
> Are there in the assetstore other files apart of bitstreams or may be
> something in my counting is wrong?
>
> --
> You received this message because you are subscribed to the Google Groups
> "DSpace Technical Support" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to dspace-tech+unsubscr...@googlegroups.com.
> To post to this group, send email to dspace-tech@googlegroups.com.
> Visit this group at https://groups.google.com/group/dspace-tech.
> For more options, visit https://groups.google.com/d/optout.
>


-- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To post to this group, send email to dspace-tech@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.


[dspace-tech] What is in the assetstore?

2017-04-10 Thread Evgeni Dimitrov
This is with DSpace 5.6.

I had to provide for every top community
- number of items
- number of bitstreams

I used REST - for items:

rest/items?offset==parentCommunityList

for bitstreams:

rest/items//bitstreams?offset=

The total for all top communities is
- items 24601
- bitstreams 335380

Just to double-check I got the number of files in the assetstore

find /assetstore -type f | wc -l

It is 385867 - much bigger than the number of bitstreams - 50487 files more.
Are there in the assetstore other files apart of bitstreams or may be
something in my counting is wrong?

-- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To post to this group, send email to dspace-tech@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.