No, the systems are identical :)
[r...@wn140 ~]# hostname
wn140.grid.ucy.ac.cy <http://wn140.grid.ucy.ac.cy>
[r...@wn140 ~]# uname -a
Linux wn140.grid.ucy.ac.cy <http://wn140.grid.ucy.ac.cy>
2.6.9-78.0.13.ELsmp #1 SMP Wed Jan 14 19:07:47 CST 2009 i686 athlon i386
GNU/Linux
[r...@wn140 ~]# cat /etc/redhat-release
Scientific Linux SL release 4.7 (Beryllium)
[r...@wn140 pvfs-2.8.1]# more /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 15
model : 65
model name : Dual-Core AMD Opteron(tm) Processor 2214
stepping : 2
cpu MHz : 2200.000
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov
pat pse36 clflush mmx fxsr sse sse2 ht pni syscall nx mmxext fxsr_opt
rdtscp l
[r...@wn141 ~]# hostname
wn141.grid.ucy.ac.cy <http://wn141.grid.ucy.ac.cy>
[r...@wn141 ~]# uname -a
Linux wn141.grid.ucy.ac.cy <http://wn141.grid.ucy.ac.cy>
2.6.9-78.0.13.ELsmp #1 SMP Wed Jan 14 19:07:47 CST 2009 i686 athlon i386
GNU/Linux
[r...@wn141 ~]# cat /etc/redhat-release
Scientific Linux SL release 4.7 (Beryllium)
[r...@wn141 pvfs-2.8.1]# more /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 15
model : 65
model name : Dual-Core AMD Opteron(tm) Processor 2214
stepping : 2
cpu MHz : 2200.000
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov
pat pse36 clflush mmx fxsr sse sse2 ht pni syscall nx mmxext fxsr_opt
rdtscp lm
Patch applied, logs updated!
http://grid.ucy.ac.cy/file/pvfs_logwn140.grid.ucy.ac.cy
http://grid.ucy.ac.cy/file/pvfs_logwn141.grid.ucy.ac.cy
thanks,
Asterios Katsifodimos
High Performance Computing systems Lab
Department of Computer Science, University of Cyprus
http://grid.ucy.ac.cy
On Mon, Apr 6, 2009 at 10:03 PM, Phil Carns <[email protected]
<mailto:[email protected]>> wrote:
That didn't show what I expected at all. It must have hit a safety
check on the request parameters. Could you try adding in the
attached patch as well?
What kind of systems are these? Are the two servers different
architectures by any chance?
thanks,
-Phil
Asterios Katsifodimos wrote:
Thanks!
I have applied the patch.
I have replaced the old logs with the new ones. Just use the
previous links.
http://grid.ucy.ac.cy/file/pvfs_logwn140.grid.ucy.ac.cy
http://grid.ucy.ac.cy/file/pvfs_logwn141.grid.ucy.ac.cy
thanks a lot for your help,
On Mon, Apr 6, 2009 at 8:41 PM, Phil Carns <[email protected]
<mailto:[email protected]> <mailto:[email protected]
<mailto:[email protected]>>> wrote:
Thanks for posting the logs. It looks like the create_list
function
in within Trove actually generated the EINVAL error, but there
aren't enough log messages in that path to know why.
Any chance you could apply the patch attached to this email and
retry this scenario (with verbose logging)? I'm hoping for some
extra output after the line that looks like this:
(0x8d4f020) batch_create (prelude sm) state: perm_check
(status = 0)
thanks,
-Phil
Asterios Katsifodimos wrote:
Yes both of them. Because now both are Metadata servers.
When I
had one metadata and
one IO server, the metadata server was not producing the
errors
until the IO server got up.
From the time that the IO server gets up, the Metadata
server
is getting crazy...
I have uploaded the log files here:
http://grid.ucy.ac.cy/file/pvfs_logwn140.grid.ucy.ac.cy
http://grid.ucy.ac.cy/file/pvfs_logwn141.grid.ucy.ac.cy
have a look!
thanks!
On Mon, Apr 6, 2009 at 7:00 PM, Phil Carns
<[email protected] <mailto:[email protected]>
<mailto:[email protected] <mailto:[email protected]>>
<mailto:[email protected] <mailto:[email protected]>
<mailto:[email protected] <mailto:[email protected]>>>>
wrote:
Ok. Could you try "verbose" now as the log level? It is
close to
the "all" level but should only print information
while the
server
is busy.
Are both wn140 and wn141 showing the same batch create
errors, or
just one of them?
thanks,
-Phil
Asterios Katsifodimos wrote:
Hello Phil,
Thanks for you answer.
Yes I delete the storage dir every time I make a new
configuration
and I run the pvfs2-server -f command before
starting the
daemons.
The only thing that I get from the servers is the
batch_create,
starting server, and the "PVFS2 server got signal 15
(server_status_flag: 507903"
error message. Do you want me to try on an other
log level?
Also, this is how the server is configured:
***** Displaying PVFS Configuration Information *****
------------------------------------------------------
PVFS2 configured to build karma gui
: no
PVFS2 configured to perform coverage analysis
: no
PVFS2 configured for aio threaded callbacks
: yes
PVFS2 configured to use FUSE
: no
PVFS2 configured for the 2.6.x kernel module
: no
PVFS2 configured for the 2.4.x kernel module
: no
PVFS2 configured for using the mmap-ra-cache
: no
PVFS2 will use workaround for redhat 2.4 kernels
: no
PVFS2 will use workaround for buggy NPTL
: no
PVFS2 server will be built
: yes
PVFS2 version string: 2.8.1
thanks again,
On Mon, Apr 6, 2009 at 5:21 PM, Phil Carns
<[email protected] <mailto:[email protected]>
<mailto:[email protected] <mailto:[email protected]>>
<mailto:[email protected]
<mailto:[email protected]> <mailto:[email protected]
<mailto:[email protected]>>>
<mailto:[email protected] <mailto:[email protected]>
<mailto:[email protected] <mailto:[email protected]>>
<mailto:[email protected]
<mailto:[email protected]> <mailto:[email protected]
<mailto:[email protected]>>>>>
wrote:
Hello,
I'm not sure what would cause that "Invalid
argument"
error.
Could you try the following steps:
- kill both servers
- modify your configuration files to set
"EventLogging" to "none"
- delete your old log files (or move them to
another
directory)
- start the servers
You can then send us the complete contents of
both log
files
and we
can go from there. The "all" level is a little
hard
to interpret
because it generates a lot of information even when
servers
are idle.
Also, when you went from one server to two, did
you delete
your old
storage space (/pvfs) and start over, or are
you trying to
keep that
data and add servers to it?
thanks!
-Phil
Asterios Katsifodimos wrote:
Hello all,
I have been trying to install PVFS 2.8.1 on
Ubuntu
server,
Centos4 and
Scientific Linux 4. I compile it and can
run it on
a "single
host" configuration
without any problems.
However, when I add more nodes to the
configuration(always using the
pvfs2-genconfig defaults ) I have the following
problem:
*On the metadata node I get these messages:*
[E 04/02 20:16] batch_create request got:
Invalid
argument
[E 04/02 20:16] batch_create request got:
Invalid
argument
[E 04/02 20:16] batch_create request got:
Invalid
argument
[E 04/02 20:16] batch_create request got:
Invalid
argument
*In the IO nodes I get:*
[r...@wn140 ~]# tail -50 /tmp/pvfs2-server.log
[D 04/02 23:53] BMI_testcontext completing:
18446744072456767880
[D 04/02 23:53] [SM Entering]: (0x88f8b00)
msgpairarray_sm:complete (status: 1)
[D 04/02 23:53] [SM frame get]: (0x88f8b00)
op-id: 37
index: 0
base-frm: 1
[D 04/02 23:53] msgpairarray_complete: sm
0x88f8b00
status_user_tag 1 msgarray_count 1
[D 04/02 23:53] msgpairarray: 1
operations remain
[D 04/02 23:53] [SM Exiting]: (0x88f8b00)
msgpairarray_sm:complete (error code:
-1073742006), (action:
DEFERRED)
[D 04/02 23:53] [SM Entering]: (0x88f8b00)
msgpairarray_sm:complete (status: 0)
[D 04/02 23:53] [SM frame get]: (0x88f8b00)
op-id: 37
index: 0
base-frm: 1
[D 04/02 23:53] msgpairarray_complete: sm
0x88f8b00
status_user_tag 0 msgarray_count 1
[D 04/02 23:53] msgpairarray: all operations
complete
[D 04/02 23:53] [SM Exiting]: (0x88f8b00)
msgpairarray_sm:complete (error code: 190),
(action:
COMPLETE)
[D 04/02 23:53] [SM Entering]: (0x88f8b00)
msgpairarray_sm:completion_fn (status: 0)
[D 04/02 23:53] [SM frame get]: (0x88f8b00)
op-id: 37
index: 0
base-frm: 1
[D 04/02 23:53] (0x88f8b00) msgpairarray state:
completion_fn
[E 04/02 23:53] Warning: msgpair failed to
tcp://wn141:3334,
will retry: Connection refused
[D 04/02 23:53] *** msgpairarray_completion_fn:
msgpair 0
failed, retry 1
[D 04/02 23:53] *** msgpairarray_completion_fn:
msgpair
retrying
after delay.
[D 04/02 23:53] [SM Exiting]: (0x88f8b00)
msgpairarray_sm:completion_fn (error code:
191),
(action:
COMPLETE)
[D 04/02 23:53] [SM Entering]: (0x88f8b00)
msgpairarray_sm:post_retry (status: 0)
[D 04/02 23:53] [SM frame get]: (0x88f8b00)
op-id: 37
index: 0
base-frm: 1
[D 04/02 23:53] msgpairarray_post_retry: sm
0x88f8b00,
wait 2000 ms
[D 04/02 23:53] [SM Exiting]: (0x88f8b00)
msgpairarray_sm:post_retry (error code: 0),
(action:
DEFERRED)
[D 04/02 23:53] [SM Entering]: (0x89476c0)
perf_update_sm:do_work (status: 0)
[P 04/02 23:53] Start times (hr:min:sec):
23:53:11.330
23:53:10.310 23:53:09.287 23:53:08.268
23:53:07.245
23:53:06.225
[P 04/02 23:53] Intervals (hr:min:sec) :
00:00:01.026
00:00:01.020 00:00:01.023 00:00:01.019
00:00:01.023
00:00:01.020
[P 04/02 23:53]
-------------------------------------------------------------------------------------------------------------
[P 04/02 23:53] bytes read :
0 0 0
0 0 0
[P 04/02 23:53] bytes written :
0 0 0
0 0 0
[P 04/02 23:53] metadata reads :
0 0 0
0 0 0
[P 04/02 23:53] metadata writes :
0 0 0
0 0 0
[P 04/02 23:53] metadata dspace ops :
0 0 0
0 0 0
[P 04/02 23:53] metadata keyval ops :
1 1 1
1 1 1
[P 04/02 23:53] request scheduler :
0 0 0
0 0 0
[D 04/02 23:53] [SM Exiting]: (0x89476c0)
perf_update_sm:do_work
(error code: 0), (action: DEFERRED)
[D 04/02 23:53] [SM Entering]: (0x8948810)
job_timer_sm:do_work
(status: 0)
[D 04/02 23:53] [SM Exiting]: (0x8948810)
job_timer_sm:do_work
(error code: 0), (action: DEFERRED)
[D 04/02 23:53] [SM Entering]: (0x89476c0)
perf_update_sm:do_work (status: 0)
[P 04/02 23:53] Start times (hr:min:sec):
23:53:12.356
23:53:11.330 23:53:10.310 23:53:09.287
23:53:08.268
23:53:07.245
[P 04/02 23:53] Intervals (hr:min:sec) :
00:00:01.020
00:00:01.026 00:00:01.020 00:00:01.023
00:00:01.019
00:00:01.023
[P 04/02 23:53]
-------------------------------------------------------------------------------------------------------------
[P 04/02 23:53] bytes read :
0 0 0
0 0 0
[P 04/02 23:53] bytes written :
0 0 0
0 0 0
[P 04/02 23:53] metadata reads :
0 0 0
0 0 0
[P 04/02 23:53] metadata writes :
0 0 0
0 0 0
[P 04/02 23:53] metadata dspace ops :
0 0 0
0 0 0
[P 04/02 23:53] metadata keyval ops :
1 1 1
1 1 1
[P 04/02 23:53] request scheduler :
0 0 0
0 0 0
[D 04/02 23:53] [SM Exiting]: (0x89476c0)
perf_update_sm:do_work
(error code: 0), (action: DEFERRED)
[D 04/02 23:53] [SM Entering]: (0x8948810)
job_timer_sm:do_work
(status: 0)
[D 04/02 23:53] [SM Exiting]: (0x8948810)
job_timer_sm:do_work
(error code: 0), (action: DEFERRED)
The metadata node keeps asking for
something that
the IO
nodes
cannot give
the right way. So it complains. This makes the
nodes and the
metadata node
not to work.
I have installed those services many times.
I have
tested
this
using berkeley
db 4.2 and 4.3 on Redhat systems(centos,
scientific
linnux) and
on Ubuntu server.
I have also tried the PVFS version 2.6.3
and I get the
same problem.
*My config files look like:*
[r...@wn140 ~]# more /etc/pvfs2-fs.conf
<Defaults>
UnexpectedRequests 50
EventLogging all
EnableTracing no
LogStamp datetime
BMIModules bmi_tcp
FlowModules flowproto_multiqueue
PerfUpdateInterval 1000
ServerJobBMITimeoutSecs 30
ServerJobFlowTimeoutSecs 30
ClientJobBMITimeoutSecs 300
ClientJobFlowTimeoutSecs 300
ClientRetryLimit 5
ClientRetryDelayMilliSecs 2000
PrecreateBatchSize 512
PrecreateLowThreshold 256
StorageSpace /pvfs
LogFile /tmp/pvfs2-server.log
</Defaults>
<Aliases>
Alias wn140 tcp://wn140:3334
Alias wn141 tcp://wn141:3334
</Aliases>
<Filesystem>
Name pvfs2-fs
ID 320870944
RootHandle 1048576
FileStuffing yes
<MetaHandleRanges>
Range wn140 3-2305843009213693953
Range wn141
2305843009213693954-4611686018427387904
</MetaHandleRanges>
<DataHandleRanges>
Range wn140
4611686018427387905-6917529027641081855
Range wn141
6917529027641081856-9223372036854775806
</DataHandleRanges>
<StorageHints>
TroveSyncMeta yes
TroveSyncData no
TroveMethod alt-aio
</StorageHints>
</Filesystem>
My setup is made from two nodes that are
both IO
and Metadata
nodes. I have also tried
a 4 node setup with 2I/O - 2 MD nodes
resulting in the
same thing.
Any suggestions?
thank you in advance,
--
Asterios Katsifodimos
High Performance Computing systems Lab
Department of Computer Science, University
of Cyprus
http://www.asteriosk.gr
<http://www.asteriosk.gr/>
------------------------------------------------------------------------
_______________________________________________
Pvfs2-users mailing list
[email protected]
<mailto:[email protected]>
<mailto:[email protected]
<mailto:[email protected]>>
<mailto:[email protected]
<mailto:[email protected]>
<mailto:[email protected]
<mailto:[email protected]>>>
<mailto:[email protected]
<mailto:[email protected]>
<mailto:[email protected]
<mailto:[email protected]>>
<mailto:[email protected]
<mailto:[email protected]>
<mailto:[email protected]
<mailto:[email protected]>>>>
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users