We only have .pdf and .doc files in our repository, so as it turned out
the only rows in the bundle table that had a value in
primary_bitstream_id were the problematic ones.  I wonder how it
happened though...??  It may have something to do with the admin
interface where you can delete a document and then upload another one,
probably in the same session, I'm thinking, since we don't do much of
this and we only had about 12 mismatched rows in the table.  In most of
the cases, there were 3 rows in the bitstream table tied to a single
Item - two .pdf rows and one .pdf.txt row, and one of the .pdf rows was
marked as "deleted".  In one odd case though, I found two .pdf rows and
two .pdf.txt rows for the same Item and all four were marked as
"deleted", yet there was still a value in the associated bundle row in
primary_bitstream_id.  Oh, and there was *no* associated
bundle2bitstream row.  Strange.

 

Sue

 

________________________________

From: Diggory Mark [mailto:mdigg...@gmail.com] 
Sent: Wednesday, December 10, 2008 6:13 PM
To: Thornton, Susan M. (LARC-B702)[NCI INFORMATION SYSTEMS]
Cc: dspace-tech@lists.sourceforge.net; Smail, James W. (LARC-B702)[NCI
INFORMATION SYSTEMS]
Subject: Re: [Dspace-tech] "Cleanup" cron question

 

However, primary_bitstream_id is used by the application even in 1.5 to
identify the case when only one bitstream should be exposed in the UI
(for instance an Item containing a website in HTML).  This is not
functionality that has "gone away".

 

-Mark

 

On Dec 10, 2008, at 2:34 PM, Thornton, Susan M. (LARC-B702)[NCI
INFORMATION SYSTEMS] wrote:





Wow!  I just answered my own question, thanks to an idea I got from
Brian's reply to my question ("It seems to me I've seen that the
primary_bitstream_id and the mets_bitstream_id are both unused fields,
and should be Null."...:-)  Yes, the cleanup job DOES physically delete
all the rows in the bitstream table where "deleted is true".  I ran the
following SQL query to correct the relational integrity problems between
the bundle and bitstream tables:

 

UPDATE bundle

   SET primary_bitstream_id = NULL

WHERE primary_bitstream_id > 0

 

Once I ran this query (it updated 7 rows in the bundle table), I was
able to get "cleanup" to run to successful completion, and YES - it
deleted all those rows in the bitstream table that were "marked" for
deletion.

 

For those of you who either don't have this problem or don't KNOW you
have this problem, look in dspace.log after your "cleanup" job runs.
You may see something like:

 

2008-12-10 15:19:00,921 FATAL org.dspace.storage.bitstore.Cleanup @
Caught exception:

org.postgresql.util.PSQLException: ERROR: update or delete on table
"bitstream" violates foreign key constraint "$2" on table "bundle"

  Detail: Key (bitstream_id)=(43213) is still referenced from table
"bundle".

            at
org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecu
torImpl.java:1592)

            at
org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImp
l.java:1327)

            at
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:
193)

            at
org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Stateme
nt.java:452)

            at
org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdb
c2Statement.java:351)

            at
org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2S
tatement.java:305)

            at
org.apache.commons.dbcp.DelegatingPreparedStatement.executeUpdate(Delega
tingPreparedStatement.java:101)

            at
org.apache.commons.dbcp.DelegatingPreparedStatement.executeUpdate(Delega
tingPreparedStatement.java:101)

            at
org.dspace.storage.rdbms.DatabaseManager.updateQuery(DatabaseManager.jav
a:519)

            at
org.dspace.storage.rdbms.DatabaseManager.updateQuery(DatabaseManager.jav
a:547)

            at
org.dspace.storage.rdbms.DatabaseManager.deleteByValue(DatabaseManager.j
ava:702)

            at
org.dspace.storage.rdbms.DatabaseManager.delete(DatabaseManager.java:669
)

            at
org.dspace.storage.bitstore.BitstreamStorageManager.cleanup(BitstreamSto
rageManager.java:663)

            at
org.dspace.storage.bitstore.Cleanup.main(Cleanup.java:109)

 

 

The reason why you may not know you have the problem is because the
cleanup cron does not check for return code after executing.  I just
happened to notice the above error in dspace.log one day and, after
adding the following code to the cleanup cron, I began to see that
cleanup never completed successfully:

************************************************************************
****************************************

<original code>

# Shell script for cleaning the asset store.

 

# Get the DSPACE/bin directory

BINDIR=`dirname $0`

 

echo "Cleaning the asset store"

 

$BINDIR/dsrun org.dspace.storage.bitstore.Cleanup

 

<I added the following code:>

# Check to see if the program executed successfully

if [ "$?" -ne "0" ]; then

   echo "cleanup cron failed"

   exit 1

fi

 

echo "cleanup cron completed successfully!"

exit 0

************************************************************************
*************************************

 

Thanks to everyone who responded to my post!

Happy Holidays!!

Sue

 

________________________________

From: Thornton, Susan M. (LARC-B702)[NCI INFORMATION SYSTEMS]
[mailto:susan.m.thorn...@nasa.gov] 
Sent: Wednesday, December 10, 2008 3:24 PM
To: dspace-tech@lists.sourceforge.net
Cc: Smail, James W. (LARC-B702)[NCI INFORMATION SYSTEMS]
Subject: [Dspace-tech] "Cleanup" cron question

 

     In DSpace 1.4.2, what exactly does the "cleanup" job do?  For one
thing, I think it deletes assetstore entries for online records in
DSpace that have been deleted, however I'm wondering if it also deletes
"replaced" bitstream rows?  For example, let's say a document for an
Item is corrupt and you "delete" that document and upload a new one, the
bitstream row for the original document doesn't actually get deleted in
DSpace - column bitstream.deleted gets set to "true".  Is it the cleanup
job that is actually supposed to physically delete that old row in the
database?  If not, how/when does it actually get deleted?  I just
noticed that we have over 2,100 "duplicate" document names in the
database, mostly with 1 record marked as "deleted=true" and one marked
as "deleted=false".

 

     Also, has anyone reported having problems with "cleanup" failing
due to referential integrity problems with the "bundle" table?  We've
had that problem for a looong time now and I just recently figured out
how to correct it.

Thanks,

Sue

 

 

Sue Walker-Thornton

ConITS Contract
NASA Langley Research Center
Integrated Library Systems Application & Database Administrator

130 Research Drive

Hampton, VA  23666

Office: (757) 224-4074
Fax:    (757) 224-4001
Pager: (757) 988-2547 
Email:  susan.m.thorn...@nasa.gov <mailto:susan.m.thorn...@nasa.gov> 

 

------------------------------------------------------------------------
------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas,
Nevada.
The future of the web can't happen without you.  Join us at MIX09 to
help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.
com/_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech

 

~~~~~~~~~~~~~

Mark R. Diggory

http://purl.org/net/mdiggory/homepage

 

 

 

------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to