Does 4.3.1 already contain the mitigation for the Log4j2 vulnerability? On Sun, Dec 12, 2021 at 1:24 PM Marco Neumann <marco.neum...@gmail.com> wrote:
> As Andy mentioned, I will give the 4.3.1 xloader a try with the new 4TB > SSD drive and an old laptop. > > I also have a contact who has just set up a new datacenter in Ireland. I > may be able to run a few tests on much bigger machines as well. Otherwise I > am very happy with the iron in Finland.as long as they are dedicated > machines. > > On Sun, Dec 12, 2021 at 12:44 PM Andy Seaborne <a...@apache.org> wrote: > >> >> >> On 11/12/2021 22:02, Marco Neumann wrote: >> > Thank you Øyvind for sharing, great to see more tests in the wild. >> > >> > I did the test with a 1TB SSD / RAID1 / 64GB / ubuntu and the truthy >> > dataset and quickly ran out of disk space. It finished the job but did >> not >> > write any of the indexes to disk due to lack of space. no error >> messages. >> >> The 4.3.1 xloader should hopefully address the space issue. >> >> Andy >> >> > >> > http://www.lotico.com/temp/LOG-95239 >> > >> > I have now ordered a new 4TB SSD drive to rerun the test possibly with >> the >> > full wikidata dataset, >> > >> > I personally had the best experience with dedicated hardware so far >> (can be >> > in the data center), shared or dedicated virtual compute engines did not >> > deliver as expected. And I have not seen great benefits from data center >> > grade multicore cpus. But I think they will during runtime in multi user >> > settings (eg fuseki). >> > >> > Best, >> > Marco >> > >> > On Sat, Dec 11, 2021 at 9:45 PM Øyvind Gjesdal <oyvin...@gmail.com> >> wrote: >> > >> >> I'm trying out tdb2.xloader on an openstack vm, loading the wikidata >> truthy >> >> dump downloaded 2021-12-09. >> >> >> >> The instance is a vm created on the Norwegian Research and Education >> Cloud, >> >> an openstack cloud provider. >> >> >> >> Instance type: >> >> 32 GB memory >> >> 4 CPU >> >> >> >> The storage used for dump + temp files is mounted as a separate 900GB >> >> volume and is mounted on /var/fuseki/databases >> >> .The type of storage is described as >> >>> *mass-storage-default*: Storage backed by spinning hard drives, >> >> available to everybody and is the default type. >> >> with ext4 configured. At the moment I don't have access to the faster >> >> volume type mass-storage-ssd. CPU and memory are not dedicated, and >> can be >> >> overcommitted. >> >> >> >> OS for the instance is a clean Rocky Linux image, with no services >> except >> >> jena/fuseki installed. The systemd service set up for fuseki is >> stopped. >> >> jena and fuseki version is 4.3.0. >> >> >> >> openjdk 11.0.13 2021-10-19 LTS >> >> OpenJDK Runtime Environment 18.9 (build 11.0.13+8-LTS) >> >> OpenJDK 64-Bit Server VM 18.9 (build 11.0.13+8-LTS, mixed mode, >> sharing) >> >> >> >> I'm running from a tmux session to avoid connectivity issues and to >> capture >> >> the output. I think the output is stored in memory and not on disk. >> >> On First run I tried to have the tmpdir on the root partition, to >> separate >> >> temp dir and data dir, but with only 19 GB free, the tmpdir soon was >> disk >> >> full. For the second (current run) all directories are under >> >> /var/fuseki/databases. >> >> >> >> $JENA_HOME/bin/tdb2.xloader --loc /var/fuseki/databases/wd-truthy >> --tmpdir >> >> /var/fuseki/databases/tmp latest-truthy.nt.gz >> >> >> >> The import is so far at the "ingest data" stage where it has really >> slowed >> >> down. >> >> >> >> Current output is: >> >> >> >> 20:03:43 INFO Data :: Add: 502,000,000 Data (Batch: 3,356 / >> >> Avg: 7,593) >> >> >> >> See full log so far at >> >> >> https://gist.github.com/OyvindLGjesdal/c1f61c0f7d3ab5808144d9455cd383ab >> >> >> >> Some notes: >> >> >> >> * There is a (time/info) lapse in the output log between the end of >> >> 'parse' and the start of 'index' for Terms. It is unclear to me what >> is >> >> happening in the 1h13 minutes between the lines. >> >> >> >> 22:33:46 INFO Terms :: Elapsed: 50,720.20 seconds >> [2021/12/10 >> >> 22:33:46 CET] >> >> 22:33:52 INFO Terms :: == Parse: 50726.071 seconds : >> >> 6,560,468,631 triples/quads 129,331 TPS >> >> 23:46:13 INFO Terms :: Add: 1,000,000 Index (Batch: 237,755 >> / >> >> Avg: 237,755) >> >> >> >> * The ingest data step really slows down on the "ingest data stage": >> At the >> >> current rate, if I calculated correctly, it looks like >> PKG.CmdxIngestData >> >> has 10 days left before it finishes. >> >> >> >> * When I saw sort running in the background for the first parts of the >> job, >> >> I looked at the `sort` command. I noticed from some online sources that >> >> setting the environment variable LC_ALL=C improves speed for `sort`. >> Could >> >> this be set on the ProcessBuilder for the `sort` process? Could it >> >> break/change something? I see the warning from the man page for `sort`. >> >> >> >> *** WARNING *** The locale specified by the environment affects >> >> sort order. Set LC_ALL=C to get the traditional sort order >> that >> >> uses native byte values. >> >> >> >> Links: >> >> https://access.redhat.com/solutions/445233 >> >> >> >> >> https://unix.stackexchange.com/questions/579251/how-to-use-parallel-to-speed-up-sort-for-big-files-fitting-in-ram >> >> >> >> >> https://stackoverflow.com/questions/7074430/how-do-we-sort-faster-using-unix-sort >> >> >> >> Best regards, >> >> Øyvind >> >> >> > >> > >> > > > -- > > > --- > Marco Neumann > KONA > > -- --- Marco Neumann KONA